
Reddit has filed a federal lawsuit against artificial intelligence startup Anthropic, accusing it of unlawfully using Reddit users’ content to train AI models without any licensing agreement — a legal challenge that deepens the growing rift between content owners and AI developers over data ownership and intellectual property rights.
The complaint, filed Wednesday in the U.S. District Court for the Northern District of California, alleges that Anthropic scraped Reddit’s data more than 100,000 times, ignoring the company’s robots.txt protocol and a direct warning that it did not have authorization to access or use the platform’s content. Reddit claims the scraping continued even after Anthropic told the company in 2024 that it had blocked its bots.
Reddit says the AI startup exploited this data for “billions of dollars” in value, using it to train its language models — a move Reddit argues was done without consent and in violation of its user agreement. The company is demanding compensatory damages and restitution and is also seeking an injunction to prevent Anthropic from using Reddit content in the future.
Register for Tekedia Mini-MBA edition 17 (June 9 – Sept 6, 2025) today for early bird discounts. Do annual for access to Blucera.com.
Tekedia AI in Business Masterclass opens registrations.
Join Tekedia Capital Syndicate and co-invest in great global startups.
Register to become a better CEO or Director with Tekedia CEO & Director Program.
“We will not tolerate profit-seeking entities like Anthropic commercially exploiting Reddit content for billions of dollars without any return for redditors or respect for their privacy,” said Ben Lee, Reddit’s chief legal officer, in a statement to TechCrunch.
Anthropic responded to the lawsuit in a statement, saying: “We disagree with Reddit’s claims and will defend ourselves vigorously,” according to spokesperson Danielle Ghighlieri.
This legal clash makes Reddit the first Big Tech platform to sue an AI model developer over training data. But it is far from the first entity to raise the alarm. The New York Times filed a high-profile lawsuit against OpenAI and Microsoft in December 2023, claiming the companies used millions of its articles without compensation or permission to train generative AI models like ChatGPT and Copilot.
The Times’ complaint argued that OpenAI’s models could regurgitate near-verbatim excerpts from its articles, undermining both its journalistic product and subscription business. That case has become a watershed moment in the debate over AI and copyright, as it tests whether AI companies can freely scrape and repurpose content under the guise of “fair use.”
Reddit, which is preparing for an IPO, has meanwhile positioned itself as willing to license its vast archive of user-generated data — under controlled terms. It has inked training deals with OpenAI and Google, allowing them access to Reddit posts, but says those partnerships include specific provisions designed to protect users’ privacy and intellectual property.
Sam Altman, OpenAI’s CEO, owns 8.7 percent of Reddit, making him the third-largest shareholder in the company.
The core issue in Reddit’s case, as in many other lawsuits against AI firms, is the unlicensed use of creative or proprietary content to build commercially valuable AI systems. Similar legal complaints have been filed by authors like Sarah Silverman, artists, and music publishers against a wide range of AI developers, who are increasingly being asked to justify where and how they collect their training data.
With courts now being asked to define the boundaries of “fair use” in the context of large-scale scraping and machine learning, legal experts believe that these lawsuits could reshape the regulatory and financial landscape for the AI industry. If plaintiffs like Reddit or The New York Times succeed, AI developers may be forced to either pay for the data they use or significantly restrict their access to publicly available content.
As public concern grows over the unchecked use of intellectual property in AI development, these legal battles are poised to set the precedent for how future chatbots are trained, and whether the internet’s data — from news stories to social media posts — remains free for AI consumption or protected under copyright and data rights law.