Microsoft is facing a new lawsuit from a group of prominent authors who accuse the company of using pirated copies of their books to train its Megatron artificial intelligence model.
Filed Tuesday in a federal court in New York, the suit adds to a growing wave of legal challenges from writers, publishers, and news organizations accusing major AI developers of exploiting copyrighted material without consent.
The plaintiffs—who include American Prometheus author Kai Bird, essayist Jia Tolentino, and historian Daniel Okrent—allege that Microsoft trained Megatron using a dataset containing nearly 200,000 pirated books, without notifying or compensating the authors. According to the complaint, the AI model was engineered not just to learn the language but to mimic “the syntax, voice, and themes” of the authors’ works, enabling it to generate text that closely resembles their original expression.
“The Megatron model is not merely informed by our work,” the complaint says, “it is fundamentally built on it—without authorization and without compensation.”
The authors are seeking a court order to block Microsoft’s alleged infringement and statutory damages of up to $150,000 for each work, which could result in millions of dollars in liability if the court rules in their favor. Microsoft did not immediately respond to requests for comment, and the authors’ lawyer declined to speak on the case.
But while the lawsuit appears to be part of a broader campaign by copyright holders to rein in unchecked AI training, recent court rulings suggest Microsoft might actually have the upper hand.
In just the past week, two key decisions have tilted in favor of AI companies. In one, U.S. District Judge William Alsup ruled that Anthropic’s use of copyrighted books to train its Claude AI system fell under “fair use,” calling the process “exceedingly transformative.” Alsup held that training a model to understand and generate language based on large swaths of text was different in purpose from the original works, thereby satisfying one of the primary tests for fair use under U.S. copyright law.
A day earlier, Judge Vince Chhabria dismissed a similar case brought against Meta over its Llama model, stating flatly that “there’s no disputing” the transformative nature of what Meta had built. Chhabria also noted that the authors failed to show the AI system had caused meaningful harm to the market for their books, another key factor in determining fair use.
While neither ruling directly applies to the Microsoft case, legal analysts say the decisions could shape how judges evaluate similar claims moving forward. It is believed that both judges signaled that courts are leaning toward treating AI training as a transformative, fair use of copyrighted material—especially when the output doesn’t directly compete with or replace the original work.
The back-to-back rulings are being hailed as tentative but important victories for companies like Anthropic, Meta, OpenAI, and now, potentially Microsoft. However, the authors’ lawsuit takes a slightly different tack by emphasizing that the company allegedly used pirated copies of their books, which could complicate Microsoft’s defense even if training itself is deemed lawful.
The court may need to examine not just the legality of the AI’s purpose but also the nature of the data it was fed.
The situation underscores the complexities of copyright law in the AI age. Rather than offering clarity, the recent rulings have deepened the complexity of the issue. They suggest that while training on copyrighted material may, in many instances, qualify as fair use, there may still be legal consequences depending on how that material was acquired.
Now, Microsoft’s case has become another test of where the boundaries of fair use end and where infringement begins. Some analysts believe that this issue will require Congress’ or the Supreme Court’s intervention, or this evolving legal gray area is likely to remain unsettled.