Bramblethorn
Sleep-deprived
- Joined
- Feb 16, 2012
- Posts
- 18,154
I apologise for spawning yet another AI thread, but this one seems significant. The US Copyright Office is in the process of publishing a report on copyright and AI tech. A draft version of the final part has just been released:
https://www.copyright.gov/ai/
https://www.copyright.gov/ai/Copyri...I-Training-Report-Pre-Publication-Version.pdf
It includes an analysis of the "fair use" status of generative AI, examining each of the four pillars of fair use, e.g.:
The conclusion (excerpted):
This is an opinion, not a legal precedent, but until those precedents are created it's likely the best available indicator of how those precedents may lean (and may be influential on those precedents).
It doesn't really give a clear-cut answer to the question of whether using AI to generate stories here would be copyright violation. But given that many LLMs are dependent on broad sources like Common Crawl, which do inevitably include a large amount of pirated material, and given that AI-generated fiction obviously does compete with human-written material, I think this suggests that authors shouldn't be assuming that AI-generated stories will end up in the "fair use" category.
https://www.copyright.gov/ai/
https://www.copyright.gov/ai/Copyri...I-Training-Report-Pre-Publication-Version.pdf
It includes an analysis of the "fair use" status of generative AI, examining each of the four pillars of fair use, e.g.:
...the Office rejects two common arguments about the
transformative nature of AI training. As noted above, some argue that the use of copyrighted
works to train AI models is inherently transformative because it is not for expressive
purposes.267 We view this argument as mistaken. Language models are trained on examples
that are hundreds of thousands of tokens in length, absorbing not just the meaning and parts of
speech of words, but how they are selected and arranged at the sentence, paragraph, and
document level—the essence of linguistic expression.
The conclusion (excerpted):
The Office expects that some uses of copyrighted works for generative AI
training will qualify as fair use, and some will not. On one end of the spectrum, uses for
purposes of noncommercial research or analysis that do not enable portions of the works to be
reproduced in the outputs are likely to be fair. On the other end, the copying of expressive
works from pirate sources in order to generate unrestricted content that competes in the
marketplace, when licensing is reasonably available, is unlikely to qualify as fair use. Many
uses, however, will fall somewhere in between.
This is an opinion, not a legal precedent, but until those precedents are created it's likely the best available indicator of how those precedents may lean (and may be influential on those precedents).
It doesn't really give a clear-cut answer to the question of whether using AI to generate stories here would be copyright violation. But given that many LLMs are dependent on broad sources like Common Crawl, which do inevitably include a large amount of pirated material, and given that AI-generated fiction obviously does compete with human-written material, I think this suggests that authors shouldn't be assuming that AI-generated stories will end up in the "fair use" category.