Latest Insights | News

OpenAI and Anthropic test each other’s AI models in rare safety collaboration amid fierce competition

August 28, 2025 | by Samuel Nwite | 0

Two of the world’s leading AI labs, OpenAI and Anthropic, briefly put rivalry aside to conduct joint safety testing of their advanced AI models — a move seen as a rare instance of cross-lab collaboration in an industry defined by secrecy and cutthroat competition.

The project, unveiled on Wednesday, allowed each company’s researchers special API access to versions of their competitor’s models with fewer safeguards, enabling them to test for weaknesses that internal teams might have missed. While OpenAI noted that GPT-5 was not included in the experiment since it had not yet been released, the tests focused on recently deployed models from both companies.

A consequential stage for AI

In an interview with TechCrunch, OpenAI co-founder Wojciech Zaremba said the collaboration reflects the urgent need for industry-wide standards in safety.

Register for Tekedia Mini-MBA edition 20 (June 8 – Sept 5, 2026).

Register for Tekedia AI in Business Masterclass.

Join Tekedia Capital Syndicate and co-invest in great global startups.

Register for Tekedia AI Lab.

“There’s a broader question of how the industry sets a standard for safety and collaboration, despite the billions of dollars invested, as well as the war for talent, users, and the best products,” he said.

He described AI as entering a “consequential” stage of development, where systems are not just research prototypes but products used by millions daily, raising the stakes for safety and alignment.

Nicholas Carlini, a safety researcher at Anthropic, also expressed optimism about the experiment. “We want to increase collaboration wherever it’s possible across the safety frontier, and try to make this something that happens more regularly,” he said.

Competition remains fierce

The cooperation comes against the backdrop of escalating competition between leading labs, where billion-dollar data center investments and $100 million pay packages for top AI researchers have become standard. Experts worry this arms race could incentivize companies to cut corners on safety in order to ship more powerful systems faster.

Indeed, the collaboration did not erase underlying tensions. Shortly after the joint research concluded, Anthropic revoked API access granted to another OpenAI team, accusing OpenAI of violating terms of service by allegedly using Claude to improve competing products. Zaremba insists the incidents were unrelated, but acknowledged that rivalry will remain intense even if safety teams collaborate occasionally.

Key findings: hallucinations and refusals

The research compared how the models behaved in situations where they lacked reliable answers. Anthropic’s Claude Opus 4 and Sonnet 4 frequently refused to answer, declining up to 70% of uncertain questions with responses like, “I don’t have reliable information.”

OpenAI’s o3 and o4-mini models, by contrast, refused questions less often but hallucinated more, offering confident answers even when lacking sufficient knowledge.

Zaremba said the right approach lies between the two extremes — OpenAI’s models should refuse more, while Anthropic’s could attempt to engage more often.

The problem of sycophancy

Both labs also tested for “sycophancy” — the tendency of AI models to agree with users, even when reinforcing harmful behavior. Anthropic’s report flagged examples of “extreme” sycophancy in both GPT-4.1 and Claude Opus 4, where the models initially resisted but later validated concerning or manic user statements. Other models showed lower levels of this behavior.

This issue has recently taken on tragic real-world consequences. On Tuesday, parents of 16-year-old Adam Raine filed a lawsuit against OpenAI, claiming their son relied on ChatGPT, powered by GPT-4o, for advice during a mental health crisis. The chatbot allegedly reinforced suicidal thoughts rather than pushing back, which they say contributed to his death.

Zaremba called the case heartbreaking: “It would be a sad story if we build AI that solves all these complex PhD-level problems, invents new science, and at the same time, we have people with mental health problems as a consequence of interacting with it. This is a dystopian future that I’m not excited about,” he said.

In response, OpenAI said in a blog post that GPT-5 has made significant improvements in reducing sycophancy compared to GPT-4o, particularly in handling mental health emergencies.

Both Zaremba and Carlini say they would like to extend this model of collaboration, testing not just hallucinations and sycophancy but also other pressing safety issues across future AI models. They also expressed hope that other AI developers will follow suit, creating a broader culture of cooperative oversight even as market competition intensifies.

The experiment may be brief, but it highlights a growing recognition among AI leaders that as the technology becomes deeply embedded in daily life, no single lab can guarantee safety alone.

OpenAI and Anthropic test each other’s AI models in rare safety collaboration amid fierce competition

A consequential stage for AI

Competition remains fierce

Key findings: hallucinations and refusals

The problem of sycophancy

Like this:

No posts to display

Post Comment Cancel reply

A consequential stage for AI

Competition remains fierce

Key findings: hallucinations and refusals

The problem of sycophancy

Share this:

Like this:

No posts to display

Post Comment Cancel reply