That argument is being made. It's hard to see how it will succeed. Given the vast input an AI LLM needs to do it's job, it would be REALLY hard to argue that the AI use of anyone's text is copyright infringement, especially given the proprietary algorithms required to produce the output. If anything qualifies for "fair use", certainly using 500 pages as part of a 20 million page data set would qualify. Not only do the proprietary statistical algorithms add value, and differentiate the output, but even the original word-to-number conversion that preps the sentences for statistical analysis is proprietary.
The conversion of the words of the original work into numbers is already a differentiation, a transformation of the original. It is an add-on value given to the work by the people who assign the numbers, the weightings and the numerical categorizations to the words. Arguably, the number string thereby derived from a given work is its own entity, unique in value from the original work, and that's BEFORE it is fed into the algorithms. This manipulation all happens BEFORE the number string is used as input. At that point, the original author of the original work arguably no longer has a copyright claim.
To put it more succinctly, presenting a diagram of a sentence is not plagiarism. To prep a sentence for generative AI LLM input, the sentence has to be essentially "diagrammed". If diagramming a sentence is fair use, then diagramming hundreds of sentences is also fair use. The process of diagramming the sentence adds value and serves a different purpose than the original sentence served. All AI input has to be diagrammed before it can be used. That's what produces the propriety number string for input.
This proprietary number string, with its unique weightings and categorizations, is then fed through proprietary statistical algorithms. The output is a unique result of the algorithms, the weighting and the original number conversion. So, what's left to copyright? The output stream? How?
What people don't realize is that we cannot apply copyright to a stream of numbers output from a statistical algorithm. It's like trying to copyright the sequence of digits that make up pi. That's all generative AI does - it washes categorized, weighted numbers through a set of algorithms, and spits out more numbers.
The fact that the output numbers can then be converted back into words that make sense to us is more an example of anthropomorphism than anything else. Can anthropomorphism be copyrighted? That's one hell of a stretch. Courts have always ruled that machine output cannot be copyrighted. AI won't change that.
There is nothing in the generative AI process that infringes copyright.
Addendum
If this "Office at Night" Phil Lockwood homage to Edward Hopper isn't copyright infringement, neither is AI. Every window in Lockwood's painting is a replica of an Edward Hopper painting.
Similarly, AI detection tools use the same LLM model that the generative AI itself uses. So, if generative AI violates copyright by collecting the work of many authors, then whenever an instructor submits a student essay to an AI detection tool, that instructor violates the student's copyright. The student's paper will become part of the AI detection tool's data set. Ironically, the very instructors who are concerned about AI violating copyright will happily violate student copyright in order to accomplish their own goals in reference to grading their students.
Teachers can violate a student's copyright but students get punished for using AI?
SERIOUSLY?
OpenAI has initiated Copyright Shield to protect its users from copyright claims.
Tech Companies generate BILLIONS in government fines... they just don't pay them.
No comments:
Post a Comment