The automation of tools aimed at assisting or replacing mechanical work in everyday processes inside organisations and businesses has raised several societal and ethical concerns. One of the issues brought to light was the impact of such novelties in the field of law. Mechanisms powered by artificial intelligence (AI) challenge the existing regulation in the European Union (EU) in what concerns, among others, intellectual property (IP) protection and related sui generis rights.
From an IP perspective, it is important to remind that databases have special relevance in a data-driven economy, where companies and organisations are facing the challenge of handling an ever-increasing volume of data. The way data is being managed nowadays is often viewed as a form of tracking the triumph of each organisation’s business model. This explains, for example, the sales growth of data warehousing and manipulation solutions to automate decisions based on the information stored in a particular database. In the AI era, databases are an even more important asset, because automated and machine learning systems are entirely based on training processes using data collection, thus storing data becomes quintessential to the development and functioning of AI-based technologies.
Traditionally, databases were welcomed in the IP legal systems through the legal institute of compilation, protected under international treaties such as the WIPO Copyright Treaty (WCT), which determines that any arrangement or selection of materials that stems from an original intellectual effort shall be protected by copyright law. The WCT states that “data and other material” can be the object of copyrightable compilations, where previously the Berne Convention restricted compilation to the selection of artistic and literary works.
Compilation was traditionally associated with library collections and anthologies. The IP international instruments protecting those compilations of works limited the scope of protection to their creative expression, rewarding the intellectual effort or creativity of the author, in this case of the “compiler”.
The interpretation of those instruments has evolved to encompass the compilation of all types of information and data, namely datasets and databases. The EU released a dedicated Directive in 1996 that widened the scope of legal protection to every type of databases, analogue or digital (the so-called EU Database Directive). At the time, databases were produced for purposes and forms that have come into disuse, such as CD-ROM catalogues. The EU Database Directive provides two different legal protections to databases: the copyright protection and the sui generis protection.
Copyright protection of databases in the EU
Copyright protection is granted to a database that is original in the way its content has been selected and arranged. The interpretation of this criterium, however, differs across the various national legal traditions and the debate heats up when discussing its application to databases’ rights.
The Court of Justice of the European Union (CJEU) has been consistent in its interpretation of originality in cases such asInfopaq and Painer. The uniqueness should be demonstrated by the work’s level of creativity, i.e. it should reveal the “creative touch” of its author, identifying his or her free and independent choices during the making process. Moreover, the author of a copyrightable database is entitled to the exclusive right to carry out or authorise reproductions of the whole or part of the database’s content, and to distribute, communicate, display, or perform the database to the public in any form.
Sui generis protection of databases in the EU
Since the originality requirement might prove hard to demonstrate on databases, thus making copyright protection potentially hard to confer, the EU legal framework also confers protection to non-original databases, under the so-called sui generis right regime. One of its relevant dispositions is article 7(1) of the Database Directive, which aims to protect the maker of a database against unauthorised extraction or re-utilisation of the content targeted as substantive investment.
The Directive indicates that extraction refers to any transfer of all or a substantial part of the database to another medium, while the act of re-utilisation means making available to the public a substantial part or its entirety by any form of transmission, including online. Therefore, insubstantial parts of a public database can be extracted and/or re-utilised by lawful users, as long as those acts do not interfere in the legitimate interest of the maker of the database. Exceptionally, lawful users can extract and/or re-utilise substantial parts of a non-electronic database when they do so for private purposes, and of any database for research or scientific purposes and for public security or judicial procedures. Mainly, the legislator has stablished exceptions for extraction and re-utilisation for non-commercial purposes.
When interpreting the meaning of those actions, the CJEU understood in British Horseracing Board that prohibited extraction and re-utilisation can be considered as an unauthorised act of appropriating or publicly sharing the databases’ content. Moreover, in CV-Online Latvia, the CJEU indicated that a search engine that indexes and copies contents of publicly accessible databases can be taken as a form of illegal extraction, as long as the databases’ maker proves said extraction negatively affected its investment.
Protecting databases in the AI era
What we see today are AI systems easily accessing online platforms and public databases to gather relevant data through web scrapping mechanisms, among others. Popular among companies aiming to improve efficiency in Human Resources and Information Technology departments, AI-powered scrapping tools are in a first moment trained through machine learning techniques (supervised or unsupervised) to identify the relevant data to be collected. The trained AI algorithms can scrape data and content from any website or publicly available database, which will result in the extraction of an indiscriminate amount of data that can be re-utilised in line with the company’s strategy. Even when a website has blocking mechanisms to avoid excessive traffic and repetitive scraping from the same IP address, many companies adopt dynamic proxy servers to constantly change IP address in order to bypass the blockage. For instance, there is no tool widely used to control whether a substantial part of the database has been extracted or not, and whether that part will be used for commercial purposes, making it difficult to track if there was a violation of database rights.
Moreover, the sui generis right aims to protect the maker of the database that substantially invested in obtaining, presenting, or verifying data that will compose the database. In British Horseracing Board, the CJEU understood that such investment should not be assessed by the measures taken to create the data. In what concerns AI-generated databases, the processes of creation, verification and obtention of data are easily blurred in current data mining practices for machine learning. This means that a single AI tool can perform all those tasks. Thus, an approach to protect an investment by encompassing all the processes required for the creation of an AI system is more efficient, despite the CJEU position seemingly suggesting otherwise.
In terms of copyright protection, the developer of the AI model that deployed supervised learning and gave instructions on the disposition of data could own copyright over the selection or arrangement of the content of the database. However it would be an element difficult to be assessed by a court, because it involves reverse engineering to assess the author’s “creative choices”, potentially infringing trade secret rights of the owner. Former opinion was given in the sense that “substantial investment” should be widely interpreted to also encompass copyright protection, an idea that could cause over-protection of databases, but could also reward the developers with copyright protection by applying compulsory licensing of the database content.
To conclude, databases are now the trendy tool in automation processes concerning many organisations’ business models. Despite their insertion in the IP legal system as compilations, electronic databases are instruments that seem to deserve a revision of the legal framework in force in the EU. The rule stablished by Article 3(1) of the Database Directive concerning copyright protection has little practicality in assessing the fulfilment of the originality requirement in automated databases.
Regarding market practices of automating web scrapping tools, databases should be embedded by cybersecurity measures that protect the interests of the database maker. The exceptions of sui generis rights should, however, be object of assessment, if not reformulation, considering that most of AI-powered scrapping tools are at the service of companies aiming to enhance the efficiency of their service, which does not constitute a private use. Moreover, the data mining practice constituting AI-generated databases collects, verifies, and presents data altogether, which must be considered as a substantial investment of the databases’ maker. In all cases, the AI development stage we find ourselves in seems to call for an update of the database legal framework in terms of copyright and sui generis rights over databases, not only to avoid unauthorised uses thereof by way of automated scrapping tools, but also to protect the investments on AI-generated databases themselves.