I’m not pirating. I’m building my model.
To anyone who is reading this comment without reading through the article. This ruling doesn’t mean that it’s okay to pirate for building a model. Anthropic will still need to go through trial for that:
But he rejected Anthropic’s request to dismiss the case, ruling the firm would have to stand trial over its use of pirated copies to build its library of material.
I also read through the judgement, and I think it’s better for anthropic than you describe. He distinguishes three issues:
A) Use any written material they get their hands on to train the model (and the resulting model doesn’t just reproduce the works).
B) Buy a single copy of a print book, scan it, and retain the digital copy for a company library (for all sorts of future purposes).
C) Pirate a book and retain that copy for a company library (for all sorts of future purposes).
A and B were fair use by summary judgement. Meaning this judge thinks it’s clear cut in anthropics favor. C will go to trial.
Pirate everything!
Anakin: “Judge backs AI firm over use of copyrighted books”
Padme: “But they’ll be held accountable when they reproduce parts of those works or compete with the work they were trained on, right?”
Anakin: “…”
Padme: “Right?”IMO the focus should have always been on the potential for AI to produce copyright-violating output, not on the method of training.
If you try to sell “the new adventures of Doctor Strange, Jonathan Strange and Magic Man.” existing copyright laws are sufficient and will stop it. Really, training should be regulated by the same laws as reading. If they can get the material through legitimate means it should be fine, but pulling data that is not freely accessible should be theft, as it is already.
That “freely” there really does a lot of hard work.
It means what it means, “freely” pulls its own weight. I didn’t say “readily” accessible. Torrents could be viewed as “readily” accessible but it couldn’t be viewed as “freely” accessible because at the very least you bear the guilt of theft. Library books are “freely” accessible, and if somehow the training involved checking out books and returning them digitally, it should be fine. If it is free to read into neurons it is free to read into neural systems. If payment for reading is expected then it isn’t free.
Civil cases of copyright infringment are not theft, no matter what the MPIA have trained you to believe.
But they are copyright infringement, which costs more than theft.
Plantifs made that argument and the judge shoots it down pretty hard. That competition isn’t what copyright protects from. He makes an analogy with teachers teaching children to write fiction: they are using existing fantasy to create MANY more competitors on the fiction market. Could an author use copyright to challenge that use?
Would love to hear your thoughts on the ruling itself (it’s linked by reuters).
Orcs and dwarves (with a v) are creations of Tolkien, if the fantasy stories include them, it’s a violation of copyright the same as including Mickey mouse.
My argument would have been to ask the ai for the bass line to Queen & David Bowie’s Under Pressure. Then refer to that as a reproduction of copyrighted material. But then again, AI companies probably have better lawyers than vanilla ice.
80% of the book market is owned by 5 publishing houses.
They want to create a monopoly around AI and kill open source. The copyright industry is not our friend. This is a win, not a loss.
Cool than, try to do some torrenting out there and don’t hide that. Tell us how it goes.
The rules don’t change. This just means AI overlords can do it, not that you can do it too