Federal Judges Rule On Using Copyrighted Material to Train AI

By now, AI has firmly entrenched itself in most industries, making its consumption almost inevitable. The next steps are determining the guidelines and restrictions for its growth. In an incredible ruling against creative artists and authors, a California federal judge ruled that copyrighted books fall under the “fair use” doctrine when it comes to training generative AI. Siding with Anthropic, the lucrative AI startup behind the Claude LLMs, Judge William Alsup determined that “Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.”

The Copyright law under which fair use is defined looks at four elements: the purpose of the use, the type of the copyrighted work, the amount of the work used, and the effect on the market for the original work. The courts found that because Anthropic was using the texts to train their models to generate human-like texts, and not to create their own books or supplant the original works, they were not violating the free use clause. It was decided that purchasing books and making digital copies was fair game, but using pirated copies was not. Anthropic, however, has been under fire for downloading pirated copies, and Judge Alsup mentioned “that Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft, but it may affect the extent of statutory damages.”

This ruling is one of an astounding number of lawsuits brought against AI companies by creative personnel. Whether it be music, books, movies, or visual art, artists of all kinds are concerned about the impact of allowing AI companies access to their work. As with Anthropic training their LLMs from books, training large models on art in order for them to replicate some form of it is a very blurry line. At this point, most people have seen AI generated images in some form of the other. All of these began with certain reference images that the model manipulated to build the prompted image. As with most models, the larger the amount and diversity of the data they are trained in, the better the results they provide. If companies have free reign to collect and use artists’ works without penalty, then their models would likely improve but also reflect the hard work and individuality of the original artists. As seen with the Studio Ghibli art trend where people prompted ChatGPT to transform photos of them using the ghibli artistry, it is now possible for models to replicate what was once a unique, distinctive style to suit the needs of the user. The courts ruled in favor of Anthropic because they were using the books solely to train their model to produce more human-like responses. What happens when a company decides to use these models to produce and sell the same type of art they were trained on? Who will be at blame here––the company who produced the model or the end user who desired a certain art style?

Questions like these will become ever present as the capabilities of AI expand. As a new technology, its extent is still undetermined. Even though we may not be able to draw unwavering lines on the limits of AI today, it is important to set the right precedent of checks and balances when it comes to its use. Companies have started working on licenses with artists to dictate exactly how and how much of their work can be used in which way, which is a significant step towards this partnership. This is an interesting convergence between public policy, artistry, and technology which will become a recurring question through the next decades.

Thinker's Chronicle

Federal Judges Rule On Using Copyrighted Material to Train AI

Advika Rajeev