The battle of copyright in the AI-era
Artificial intelligence, such as ChatGPT and GPT-4, is challenging the established state of the law. There are currently multiple ongoing law suits which should interest intellectual property lawyers.
Do you remember The Next Rembrandt from 2016? The project where an algorithm, through machine learning, analysed the works of the Dutch painter Rembrandt and then produced its own interpretation, a new "Rembrandt" work. As Rembrandt died in 1669, the copyright protection for his works is long gone. This means that the works may be used by others without consent or remuneration. One of the legal questions that the Rembrandt project triggered was whether the created works were protected by copyright and if so, who would hold the rights to works created by artificial intelligence (AI). Another question is whether the algorithm could have used Rembrandt's paintings if the works were protected by copyright?
Since 2016, the Rembrandt project has been followed by a number of new and more complex creative, generative AI. DALL-E 2, Stable Diffusion, Midjourney and Imagen are all examples of AI that can generate pictures based on text descriptions. In this article, we will take a closer look at whether works protected by copyright can legally be used to train such AI applications.
Crucial AI training
An AI's success will to a high degree depend on the training process. An AI must be trained before it is capable of executing tasks correctly. AI is often trained by using organized collections of data referred to as datasets. The size of the dataset depends on whether the AI has been pre-trained or whether it must be trained from scratch. Datasets will typically encompass a training set used for training the AI, and a test set, which includes data that the AI has not seen before, to test whether the AI has learned anything or only memorised the data from the training set.
There are a number of available datasets online. As an example, Stable Diffusion is trained by using datasets provided by the organisation LAION, which consists of up to 5,85 billion text-photo pairs. Explained in simple terms, LAION offers lists of URLs to original pictures online. Hence, the data sets do not contain actual pictures – the pictures must be downloaded from the internet by those using the datasets. LAION's datasets are created by "scraping" hundreds of domains on the internet.
Those who hold copyright to protected works inter alia a picture, will generally have the sole right to make the picture available to the public and make reproductions of the picture, regardless of the means and form, and regardless of whether the reproduction is permanent or temporary.
When the AI is trained by using substantial amounts of data from datasets, reproductions of the content, such as pictures, will typically be saved in the machines memory. In this regard, one could argue that AI training infringes the sole right to reproduce copyright protected works.
Given that analysis and the use of substantial amounts of data, including copyright-protected data, is necessary for a number of important areas in society, the EU explicitly adopted exceptions for so-called text and data mining in the directive 2019/790 (DSM Directive) to ensure that such activity is not restricted by copyright. Text and data mining (TDM) generally refers to machine-based analysis of large amounts of data in order to obtain knowledge. It is assumed that training of AI in most cases will fit the definition of TDM in the DSM directive.
The TDM exceptions in the DSM Directive are found in both Articles 3 and 4. While Article 3 allows for TDM for, among other things, research organizations for the purpose of scientific research, article 4 allows for TDM for all purposes - regardless of whether the motive is commercial or not. For that reason, Article 4 has been debated. Although TDM can be seen as a prerequisite for the development of AI such as DALL-E 2, Stable Diffusion and ChatGPT, it is disputed whether TDM for commercial purposes should be exempt from copyright protection.
Exceptions should be seen in the context of regulations in the USA
A closer examination of Article 4 shows, however, that the exception provides significantly less room for TDM than one first gets the impression of. The provision allows for the reproduction and extraction of certain works protected by intellectual property rights for TDM, provided that the content is legally available, that reproductions are not retained longer than necessary and that the right holders have not made an express reservation against the work and other subject material being used for TDM ("opt-out"). Such reservations must be made appropriately. For content made available online, it will only be considered appropriate to make reservations in a machine-readable manner.
The opt-out mechanism enables right holders to make reservations against TDM. In reality, it is thus up to the right holders whether profit-based TDM is to be legal in the EU. This is in contrast to the US, where the "fair use" doctrine has been presumed to allow TDM for commercial purposes without permission from the rights holder. This disparity can mean that AI developers in the EU are put in an inferior position compared to AI developers in the United States. If the EU is serious about becoming a hub for the development and use of AI technologies, as the European Commission has stated, it is important that the framework for innovation in the EU is seen in the context of the regulations in the USA.
The DSM Directive has not yet been implemented in Norway, but it is expected that this will happen in the near future.
Several interesting court cases
With the rapid increase in the use of AI, we are also seeing an increase in AI-related lawsuits. The question of whether copyright protected works can be used to train AI has been raised before the US courts. Although it has been assumed that "fair use" in US copyright law covers certain forms of unlicensed TDM activities, these lawsuits require an assessment of whether this is in fact the case, and if so to what extent.
Stability AI, which is the company behind Stable Diffusion, is the subject of several lawsuits. In a class action brought by artists in the USA, Stability AI is sued along with DeviantArt and Midjourney. The background to the class action is the AI solution Stable Diffusion, which allegedly contains copies of millions of copyright protected. The question is whether this large-scale use of images is legal without obtaining permission from the rights holder.
Stability AI has also been sued by Getty Images in both London and Delaware for copying and using millions of protected images from Getty's database to train Stable Diffusion without consent.
Although the cases have been raised, and concern regulations, outside of Norway and the EU, and the outcome, therefore, has limited application and legal value, it is interesting to follow these first court cases related to copyright and AI training.
Is development running wild?
There is no doubt that AI represents a new technology that challenges the established legal system. The question of whether material protected by copyright can be used for training of AI does not have a clear answer. The answer can also vary depending on the jurisdiction.
In addition to the clarification we might expect from the ongoing lawsuits, legislators and other actors worldwide are proposing initiatives that could contribute to beneficial regulations regarding AI. A challenge for legislators is how they may take into account the rapid technological development. It is also worth noting that several technology leaders, including Elon Musk, have signed an open letter asking for a pause in the further development of AI models. It is interesting that parties who themselves are or have been involved in the development of AI are now expressing concern that development is going too fast.
Finally, did you have to think twice about the title of this article?
It is created by ChatGPT. And if you're wondering what AI training might look like, take a look at the picture DALL-E 2 has created.