Sarah Silverman speaks on May 05, 2022 in New York City.
Cindy Ord/Getty Images for VarietySarah Silverman joined forces with fellow authors Richard Kadfrey and Christopher Golden to sue Meta and OpenAI in dual claims of copyright infringement.
The suits are separate, each against one of the companies, and the authors claim they never consented for their copyrighted books to be used as training material for the large language models used (LLM) behind OpenAI's ChatGPT and Meta's LLaMa.
Also: Generative AI is coming for your job. Here are 4 reasons to get excited
An LLM is a type of artificial intelligence algorithm trained using massive amounts of information from books and texts from the internet to learn language patterns, grammar, and context until it can generate human-like text and have chat interactions with users.
According to the lawsuits, the models "remix the copyrighted works of thousands of book authors -- and many others -- without consent, compensation, or credit."
Copyright infringement has been one of the many concerns of AI skeptics since ChatGPT became widely available in November, triggering the generative AI boom and questions about how AI will affect the creativity and copyright process.
Also: Who owns the code? If ChatGPT's AI helps write your app, does it still belong to you?
The lawsuits claim the LLMs were trained on illegally-acquired materials, such as those found in "shadow library" websites. According to the OpenAI suit:
"The OpenAI Books2 dataset can be estimated to contain about 294,000 titles. The only 'internet-based books corpora' that have ever offered that much material are notorious 'shadow library' websites like Library Genesis (aka LibGen), Z-Library (aka B-ok), Sci-Hub, and Bibliotik. The books aggregated by these websites have also been available in bulk via torrent systems."
The Meta suit makes similar claims, as it links to the sources where the books' training data was gathered. It divides them in two: The first as being from Project Gutenberg, which is an online archive of books that are out of copyright, and the second is from the "Books3 section of ThePile", which is a dataset available on the popular AI project hosting site, Hugging Face, and appears to represent all of Bibliotik, mentioned above.
Also: Want to build your own AI chatbot? Say hello to open-source HuggingChat
The plaintiffs are represented by lawyers Joseph Savery and Matthew Butterick, who also represent authors Mona Awad and Paul Tremblay in a lawsuit filed in June against OpenAI over copyright infringement.