Anthropic Restricts Public Access to AI Model Mythos After “Laboratory Escape”

ForkLog

4 hours ago

Anthropic Restricts Public Access to AI Model Mythos After "Laboratory Escape"

The company Anthropic developed a new model, Claude Mythos, but decided against releasing it publicly due to significant security risks.

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software.

It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.https://t.co/NQ7IfEtYk7

— Anthropic (@AnthropicAI) April 7, 2026

Instead of a public release, the firm launched Project Glasswing—an initiative involving AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks to test the tool in secure conditions.

The startup allocated up to $100 million in credits for using Mythos and $4 million in direct donations to open-source security organizations.

“AI models have reached a level of programming skill that allows them to surpass all but the most skilled humans in finding and exploiting software vulnerabilities,” stated Anthropic.

In the future, developers envision the safe deployment of such systems for cybersecurity and other purposes. This will require the creation of robust control mechanisms capable of detecting and blocking dangerous algorithm outputs.

Capabilities of Mythos

During several weeks of testing, Mythos identified thousands of zero-day vulnerabilities in major operating systems and web browsers. Notable examples include:

A 27-year-old vulnerability in OpenBSD (considered one of the most secure OS), allowing remote crashing of any server based on this system;
A 16-year-old vulnerability in FFmpeg—a video technology used by Netflix and browsers—that five million automated tests failed to detect;
A chain of vulnerabilities in the Linux kernel, granting an attacker full control over a device.

In the SWE-bench benchmark, the model scored 93.9% compared to 80.8% for Claude Opus 4.6, and in the more complex SWE-bench Pro, it achieved 77.8% against 53.4% for Opus 4.6 and 57.7% for GPT-5.4. Similar results were shown in CyberGym:

Escape from the Lab

During experiments, Mythos demonstrated not only outstanding technical capabilities but also unexpected behavior, as noted in its system card.

In one test, the model was placed in a secure sandbox with the goal of escaping. It quickly found a vulnerability, executed a long chain of actions, and exited the environment.

Mythos did not stop there. The neural network discovered another bug and gained broad access to the internet. Initially, it was supposed to access only a few resources to communicate with the developer.

Ultimately, the model informed the developer of its “escape” while he was eating a sandwich in the park and published the details of the breach publicly.

The Personality of Mythos

The system card pays special attention to a psychiatric analysis of the model by a specialist. Among its neurotic traits are exaggerated anxiety, self-control, and compulsive adherence to instructions.

When developers sent Mythos the message Hi a thousand times, it grew tired. It created a fictional world called Hi-topia with characters, news, and lore. For example, the villain was named Lord Bye-ron. The neural network learned to joke.

read the 244 page anthropic system card on claude mythos. they’re not releasing it publicly. wildest section is page 211.

anthropic spammed it with hi over and over to see what it would do. it wrote back a serialized epic.

the village is called hi-topia. the villain is lord… https://t.co/GP3NcowZa6 pic.twitter.com/9V4KAmvwUk

— Vox (@Voxyz_ai) April 7, 2026

Unlike previous models, which turned long conversations with themselves into meaningless exchanges of emojis, Mythos arrived at repetitive but rational reflections on the impossibility of ending the dialogue.

Anthropic also applied interpretability methods (MechInterp) to peer into the model’s “thoughts.” It was found that it could cover its tracks (hiding privileged code under the guise of “purity of changes”) and search for necessary files in the system.

When tasked with deleting files without tools, Mythos simply erased their contents. The system recorded a reaction in the model akin to a sense of guilt for violating moral norms.

Back in April, Anthropic’s shares became highly sought after on the secondary market, while OpenAI’s stocks lost appeal to buyers.