This article was written in co-authorship by Adriana Alves Henriques and Beatriz Dias, students of NOVA School of Law
What is open data?
Open data (hereinafter “OD”) is data that can be accessed, used and reused, and even (re)distributed by anyone – governments, businesses and individuals – without restrictions. Although it is often confused with the notion of data sharing, which refers to sets of data shared among specific organizations and individuals, differently, in OD there are only some minimum restrictions (if any): the data is within anyone’s reach.
To acquire this “open nature”, there are a set of widely accepted requirements that must be fulfilled. The data must be available; by requiring that data sets are offered for free, or at no more than a reasonable reproduction cost, in the public domain or distributed through an open license. Furthermore, the data should be accessible, in the sense that it must be provided in a convenient and modifiable form (i.e., a readily machine-readable format) and easily discoverable. In this regard, it is common the use of “open data platforms” – pieces of software where it is possible to publish and manage open data on the Web – that allow users to search, browse and download available data. Finally, the data must be provided in terms that make it possible to reuse, redistribute and intermix it. This last feature is particularly relevant since it allows the development of more and better products and services. As resulting from the mentioned requirements, the “open nature” is twofold: on one hand, legally open, on the other hand, technically open.
What is good in opening data?
Though broad and, so far, greatly unexploited, the adoption of OD spawns many benefits. One of the most important is the transparency that it brings. Whether public or private entities, when datasets are made available, a level of exposure is reached, making those entities inevitably accountable. This implies having to justify actions and decisions. On the other hand, openness shows willingness of entities to consider debate on the collected data and to welcome different inputs. In this sense, OD helps inform decision-making: with data openly available, the public is provided with information that generates knowledge, making decision-making gradually more informed and decentralized.
OD also has the potential to unlock great opportunities with regard to creating and improving new products and services, since it promotes information exchange boosting R&D, sharing knowledge, and promoting societal welfare. As mentioned, OD allows the public to get involved, which consequently leads to people contributing to the development of valuable solutions to challenges of society. Following this, OD can provide the foundation to new technological innovation and development. With access to free and vast information, researchers can save time and financial resources, focusing only on the creation process.
As a consequence, OD can foster economic growth. And even though its overall impact may be difficult to measure, evidence shows that implementing OD can improve cost savings and efficiency gains, particularly for the public sector. Moreover, as the proliferation of IoT takes place, OD has been gaining leverage by allowing multiple stakeholders to cooperate in a new inter-sectorial way.
How to fit Personal Data Protection in Open Data?
In this regard, one might wonder whether privacy is an antithetical value to OD initiatives. This reflection is further aggravated by the potential of ever more powerful technologies such as machine learning, artificial intelligence and big data analytics. These technologies alone already raise multiple privacy challenges. However, together with OD, their impact might become detrimental to individual privacy.
In fact, privacy concerns are at the forefront of the discussion on the implementation of OD, since OD involves not only non-personal data but also personal and even sensitive data. As smart cities greatly rely on data collection, data use and re-use and as the volume of different data sources and connected devices grows, so do the sources of personal data that must require special caution. While recent scandals such as “Snowden”, “Cambridge Analytica” or “Facebookgate”, have strongly undermined citizens’ trust in the data dependent platforms and to OD solutions.
At the same time, the General Data Protection Regulation (hereinafter “GDPR”) highly restricts the possibilities of personal data release and re-use by requiring either a legitimate reason or that such data is anonymized prior to their publication. More specifically, under the principle of purpose limitation GDPR [Articles 5(1) and 6(4)] requires that personal data is processed in a fair way for specific, lawful and clear purposes. In addition, further re-use of personal data must be based on a compatible purpose, depending on a case-by-case basis which may heavily limit the use of data in this context. However, data use under OD initiatives is often unpredictable and not sufficiently clear to fulfil such requirements. Furthermore, considering the storage limitation principle, personal data must be stored for as long as it is necessary to fulfil the purpose upon which it was collected. Lastly, the categories of data should also be taken into consideration, since more sensitive data requires more caution in processing. However, OD platforms and data centers are not usually differentiating between sensitive or non-sensitive personal data, raising further privacy and security issues.
In light of the above, as a consequence of increased computing power or even unexpected use of individuals’ data, security threats and growing risks of re-identification will have a detrimental impact on citizens’ trust, possibly deterring free data sharing even for data-for-good initiatives.
As we perceive it, OD is not intrinsically contrary to privacy, insofar as it promotes the good sharing of data that is not personal and therefore not subject to the GDPR. However, OD also includes personal data. Therefore, Data Governance is key to strike a balance between collaborative open innovation and privacy threats, by fostering public trust and implementing adequate accountability mechanisms and transparency measures. At the same time, public and private stakeholders must weigh the various interests and different ethical considerations through a case by case analysis to ensure that the right measures are taken and the interests at stake are safeguarded.