One set of books used to train generative AI

PennameWombat

Literotica Guru
Joined
Oct 5, 2018
Posts
1,223
For those with books in the marketplace and interest in whether or not any of your book(s) appear in at least one training set commonly used to train various generative AIs, this is from the Atlantic Magazine. They've provided a portal to allow you to search by author name to determine whether your works are in the dataset. If you're a subscriber to the magazine, you can access all you want. Otherwise, passing on this gift link (which is good for the next 13 days, according to the page.)

https://www.theatlantic.com/technol...opy-link&utm_medium=social&utm_campaign=share

Scroll to bottom of article. Note that if you have short stories or other works in anthologies, those are generally included by the author/editor of the anthology, not the contained works. And as a further note, be patient if you put your name (or whatever name) into the search box and hit 'Submit'. It seems to be rather heavily used.

Editor’s note: This searchable database is part of The Atlantic’s series on Books3. You can read about the origins of the database here, and an analysis of what’s in it here.

This summer, I acquired a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. I wrote in The Atlantic about how the data set, known as “Books3,” was based on a collection of pirated ebooks, most of them published in the past 20 years. Since then, I’ve done a deep analysis of what’s actually in the data set, which is now at the center of several lawsuits brought against Meta by writers such as Sarah Silverman, Michael Chabon, and Paul Tremblay, who claim that its use in training generative AI amounts to copyright infringement.

Gift link from Twitter (two fingers up to calling it X) posting by Nova Ren Suma:
 
For those with books in the marketplace and interest in whether or not any of your book(s) appear in at least one training set commonly used to train various generative AIs, this is from the Atlantic Magazine. They've provided a portal to allow you to search by author name to determine whether your works are in the dataset. If you're a subscriber to the magazine, you can access all you want. Otherwise, passing on this gift link (which is good for the next 13 days, according to the page.)

https://www.theatlantic.com/technol...opy-link&utm_medium=social&utm_campaign=share

Scroll to bottom of article. Note that if you have short stories or other works in anthologies, those are generally included by the author/editor of the anthology, not the contained works. And as a further note, be patient if you put your name (or whatever name) into the search box and hit 'Submit'. It seems to be rather heavily used.



Gift link from Twitter (two fingers up to calling it X) posting by Nova Ren Suma:
If your ISBN/name pops up, you're included in the class action. Let your lawyers know.

Tremblay v OpenAI
 
Back
Top