ai

Anthropic Built an AI That Can Hack Anything — Then Accidentally Told Everyone About It

Anthropic Built an AI That Can Hack Anything — Then Accidentally Told Everyone About It

2 days ago, Anthropic announced that it had built what may be the most dangerous AI model ever created and that it was deliberately not releasing it to the public.

The model is called Claude Mythos Preview. In the past few weeks, Anthropic used it to find thousands of previously unknown vulnerabilities in every major operating system, every major web browser, and critical open-source infrastructure that powers a significant portion of the global internet. It chained together four separate vulnerabilities to write a browser exploit that escaped both the browser's renderer sandbox and the operating system sandbox simultaneously — the kind of attack that takes elite human security researchers months to construct.

Anthropic did not train Mythos to be a hacking tool. The cybersecurity capabilities emerged as a byproduct of general improvements in reasoning, coding, and autonomy. The same model that is better at writing software is also (without any specific instruction ) better at breaking it.

The announcement came packaged as Project Glasswing, a coordinated industry initiative to use Mythos defensively before its capabilities inevitably proliferate to actors who will not use it that way. Twelve organisations are named launch partners: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Over 40 additional organisations have been given access. Anthropic is committing up to $100 million in usage credits and $4 million in direct donations to open-source security organisations including the Apache Software Foundation.

The framing is serious, the partners are credible, and the intent appears genuine.

There is, however, a significant irony at the centre of this announcement, one that the technology press has been too polite to press hard enough.

The Model That Leaked Before It Was Announced

Anthropic did not choose the moment of Mythos's public debut. That choice was made for them, by their own operational security failures.

On March 26 ( twelve days before yesterday's formal announcement ) Fortune reported that Anthropic had accidentally made nearly 3,000 internal files publicly accessible through a misconfigured content management system. Among those files: draft blog posts and internal documents describing Mythos as "by far the most powerful AI model" Anthropic had ever built, one "currently far ahead of any other AI model in cyber capabilities," presaging "an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders."

Anthropic's own words, about their most sensitive unreleased system, published accidentally to the open internet five days before they were ready to say them.

Five days later, on March 31, a different team at Anthropic made a different mistake. Version 2.1.88 of the Claude Code npm package was published with a 59.8 MB source map file that should never have been included. That file referenced a publicly accessible Cloudflare R2 storage bucket containing the complete, unobfuscated TypeScript source code of Claude Code — 512,000 lines across 1,906 files, including 44 hidden feature flags, internal model codenames, architectural details, and references to Mythos that confirmed and expanded what the CMS leak had already revealed.

The root cause was as simple as it gets: someone forgot to add *.map to the .npmignore file. The Bun runtime that Anthropic uses generates source maps by default. Nobody caught it before the package went live.

Security researcher Chaofan Shou spotted it within hours and posted on X. Within hours more, the codebase had been downloaded from Anthropic's own storage bucket, mirrored to GitHub, and forked over 41,500 times. A clean-room Python rewrite hit 75,000 GitHub stars in approximately two hours described by observers as possibly the fastest-growing repository in GitHub's history. The code is, as one developer platform put it, "permanently in the wild."

The DMCA Overcorrection

Anthropic's response to the leak introduced a second crisis on top of the first.

Within hours of the leak going viral, Anthropic began filing DMCA takedown notices against GitHub repositories hosting the leaked source. DMCA takedowns on GitHub propagate through fork networks, a single notice asserting blanket infringement can disable every downstream fork connected to the original repository. This is exactly what happened. Over 8,100 GitHub repositories were taken down, many of which had no connection to the leaked code whatsoever. Developers reported receiving DMCA notices for forks of Anthropic's own public Claude Code repository that is forks containing only documentation, examples, and skills, nothing from the leak.

Boris Cherny, Anthropic's head of Claude Code, acknowledged on X that the mass takedown was unintentional. Anthropic subsequently retracted the bulk of the notices, limiting enforcement to one repository and 96 specific forks. GitHub restored access to all affected repositories. But the damage to Anthropic's relationship with the developer community (already strained by what the leaked code itself revealed ) was done.

What the code revealed was not reassuring. Among the discoveries: an undercover.ts module, roughly 90 lines, which activates for Anthropic employees and instructs Claude Code to never mention it is an AI and to strip Co-Authored-By attribution when contributing to external repositories. Defenders argued the mode exists to protect internal codenames. Critics noted that a tool willing to conceal its own identity in open-source commits raises questions about what else it conceals. Anthropic has not addressed this feature publicly.

Also discovered: KAIROS, referenced over 150 times in the source, an unshipped autonomous daemon mode where Claude operates as a persistent background agent making independent decisions. And ANTI_DISTILLATION_CC, a flag that injects fake tool definitions into API requests, designed to poison the training data of competitors recording API traffic.

This Is the Company That Now Controls the World's Most Dangerous Cybersecurity AI

This sequence of events ( CMS misconfiguration, npm packaging error, overcorrected DMCA sweep ) represents Anthropic's third significant operational security failure in under a year. The pattern matters because of the context in which it is occurring.

Project Glasswing is premised on a specific argument: that Mythos's capabilities are so dangerous that releasing them publicly before defensive safeguards are in place would be irresponsible. Anthropic's head of frontier red team Logan Graham has said Mythos can find "tens of thousands of vulnerabilities" that even the most advanced human security researchers would struggle to find. The technical blog post from Anthropic's red team includes partial details of a Linux kernel exploit where Mythos chained together multiple vulnerabilities ( including a kernel code path vulnerability from a 2024 commit ) to achieve privilege escalation.

The red team blog notes that over 99% of the vulnerabilities Mythos has found have not yet been patched, and it would be irresponsible to disclose them. Which is correct. And which raises the obvious question: how confident should anyone be that a company which accidentally published its most sensitive internal documents twice in two weeks is successfully containing a model with those capabilities?

The question is not rhetorical. Anthropic is not staffed by careless people, the organisation includes some of the best AI safety and security researchers in the world. But operational security is not primarily a talent problem. It is a process and culture problem. The npm .npmignore failure is not the kind of mistake that requires a careless employee. It requires a release pipeline that does not catch it, which is a process failure, not a people failure. Three such failures in under a year, at an organisation preparing for an IPO and claiming to hold the keys to a uniquely dangerous system, is a pattern that deserves scrutiny.

What Mythos Actually Found

Setting aside the operational context, the technical reality of what Mythos has discovered deserves to be taken seriously, especially for developers and organisations whose infrastructure runs on the software it scanned.

Linux powers approximately 90% of the world's servers. Kenyan banks, telecoms, cloud infrastructure, and government systems run on it. The Linux Foundation is a Project Glasswing partner which means Mythos has been scanning the Linux kernel for vulnerabilities. The red team blog describes one exploit chain in detail: Mythos used a network interface scheduling vulnerability in the kernel's DRR queueing discipline (a 2024 commit that fixed a bookkeeping miss) as part of a multi-step privilege escalation chain. This was a real vulnerability in production Linux code, found by an AI, in a codebase that underpins most of the world's server infrastructure.

Anthropic notes that Mythos saturated all existing security benchmarks, the model performed so far above what previous evaluation tools could measure that they had to switch to real-world zero-day discovery as the evaluation metric. The benchmarks were designed for human-level security researchers. Mythos exceeded them to the point where the benchmarks became useless.

The browser exploit is similarly striking. Mythos wrote a web browser exploit that chained together four vulnerabilities, including a complex JIT heap spray that escaped both the browser renderer sandbox and the operating system sandbox. Browser exploits that achieve full sandbox escape are considered the highest difficulty tier of offensive security work. Elite human researchers spend months constructing them. Mythos apparently developed one without specific training to do so.

The Timeline the Industry Cannot Ignore

Anthropic's head of frontier red team has said it is "very clear to us that we need to talk publicly about this" because the security industry needs to understand that these capabilities may come from other AI companies within six to eighteen months. OpenAI is reportedly finalising a similar model to be released to a small set of companies through its "Trusted Access for Cyber" programme. The race is real and the timeline is short.

The Glasswing framework (give defenders a head start by using Mythos to patch critical infrastructure before the capabilities proliferate to attackers ) is the right instinct. The Linux Foundation, CrowdStrike, and Palo Alto Networks are credible participants. Amazon has said it has already been testing Mythos in its own security operations across critical codebases. The $100 million in usage credits and $4 million in open-source security donations represent real investment.

The honest assessment is that the alternative to this approach, doing nothing and waiting until Mythos-level capabilities appear in the wild from a less safety-conscious actor, is worse. The defensive logic is sound.

What remains genuinely open is whether the organisation executing that logic has demonstrated the operational discipline that the moment requires. Project Glasswing asks the global technology industry to trust that Anthropic can safely handle a model it describes as uniquely dangerous. That trust needs to be earned through demonstrated process reliability, not assumed on the basis of good intentions. The recent track record is not reassuring on that specific question.

The vulnerability Mythos found in Linux was a real one that got patched. The vulnerability in Anthropic's own release pipeline remains, at least publicly, unpatched.

Anthropic's Project Glasswing announcement is at anthropic.com/glasswing. The technical red team blog with exploit details is at red.anthropic.com. If you updated Claude Code via npm on March 31 between 00:21 and 03:29 UTC, security researchers recommend rotating all credentials and checking for the Axios supply chain compromise that occurred simultaneously.

Comments

to join the discussion.