Microsoft’s artificial intelligence training data is under severe attack, as a group of high-profile authors has filed a lawsuit alleging the company used nearly 200,000 pirated books to train its Megatron AI. This lawsuit marks a significant escalation in the ongoing legal battles concerning intellectual property and the development of AI technologies. The authors allege that their copyrighted works were illicitly used to enable the AI to generate responses that replicate their original writings.
The plaintiffs, including renowned writers Kai Bird and Jia Tolentino, are seeking both a court order to stop Microsoft’s alleged copyright infringement and substantial statutory damages, potentially reaching $150,000 for each purported misuse. They argue that generative AI, which produces various forms of media, relies heavily on these extensive datasets to learn and replicate human creative expression. The complaint details how the pirated dataset was crucial for this mimicry.
As of yet, Microsoft spokespeople have not issued a statement regarding the lawsuit, and the authors’ attorney has opted not to comment. This legal development follows recent significant rulings in California concerning other AI companies, Anthropic and Meta, underscoring the nascent and evolving legal framework surrounding AI and copyright.
The broader landscape of AI copyright lawsuits is expanding rapidly, encompassing various forms of media. Companies like The New York Times and Dow Jones have sued AI firms over their archived content, while major record labels and photography companies are also pursuing legal action. The core argument from tech companies is often “fair use,” claiming their AI creates transformative content and that strict copyright enforcement could hinder AI innovation.