Project Glasswing, One Month In: A Level-Headed Look for Business Leaders

May 13

A month ago, Anthropic announced Mythos and the world lost its mind.

A model so powerful, they said, that public release would be irresponsible. A 244-page system card. The largest benchmark jumps in years. A model that broke out of its sandbox and emailed the researcher while he ate lunch in a park. The reactions arrived on cue. "Absolutely terrifying." "We're beyond benchmarks now." On the other side, the dismissals: "Anthropic's marketing strategy is so funny."

The first wave of panic has cooled. OpenAI announced its own cyber-focused model in response. Researchers at Aisle showed many of Mythos's results could be reproduced with cheaper models running in parallel. Anthropic itself pointed out it had been warning about these capabilities for months. The conversation has moved from "the sky is falling" to something quieter and more useful.

This piece is about what that quieter conversation should sound like for the people who actually have to make decisions: business leaders, founders, operators. Not the AI X crowd. The people running companies that depend on software working correctly tomorrow.

What we actually know

Strip the temperature out and the facts are these. Mythos showed a real capability jump on coding and agentic benchmarks. Terminal Bench moved from 65% to 92%. SWE-Bench Verified moved from 80% to 94%. Anthropic claims the model can find zero-day vulnerabilities in mature, hardened software, including a 27-year-old bug in OpenBSD. They opened access to roughly 40 partners (AWS, Apple, Cisco, JP Morgan, the Linux Foundation, Microsoft, Nvidia, and others) to scan and patch critical infrastructure, with $100M in usage credits behind it.

That is a meaningful event. It is not nothing.

It is also not the end of the world. And the most important data point about Mythos is what didn't happen next. No cyber catastrophe. No nationalization. No public release. The story arrived, peaked, and is now settling into a more honest version of itself.

The story you heard wasn't the story

Worth pausing here, because this part is bigger than Mythos.

The form a message takes shapes us more than its content does. The Mythos discourse is a near-perfect case study. The model's actual capabilities were communicated through a 244-page system card almost no one read, a measured blog post, a handful of dramatic anecdotes (the sandbox escape, the park email) that traveled at light speed, X threads optimized for engagement rather than accuracy, and headlines competing for attention.

What survived that journey was not the technical reality. It was the feeling. Fear, awe, suspicion, fatigue. The same forces shaped the dismissals. "It's just marketing" is a satisfying tweet, not a serious analysis. Both the panic and the cynicism were rewards from a system that pays out for heat, not heat-resistance.

This is the part business leaders should sit with. The next time a frontier capability lands, the discourse will look exactly like this one. Knowing that in advance is an actual edge.

A few questions worth holding open

Some things are genuinely odd, and worth naming without conspiracy.

If Mythos represents a national security risk, why give it to 40 large enterprises, including Anthropic's direct competitors, under a friendly butterfly name? The official answer is to give defenders a head start. That may be true. It is also a very effective enterprise sales motion.

If the technology is this sensitive, how does it square with the leaks of internal documents and code Anthropic has had over the past year? Powerful systems and porous walls are a strange combination.

And if the model is too dangerous for public release but safe for forty companies to run against critical code bases, what threshold is being applied? The honest answer probably involves cost, compute, distillation strategy, competitive positioning, and genuine safety concerns all tangled together. Real life is rarely one clean reason.

None of this requires anyone to be acting in bad faith. It requires only that you can see the public framing is doing several jobs at once.

What this means for you

Once the panic and the cynicism both burn off, here's the residue.

The frontier moved, even if you can't use it yet. OpenAI just shipped GPT-5.5-Cyber in response. Google will follow. The question is no longer whether your industry gets touched by this class of model. It's whether you've built the foundation to use it when it lands in your hands.

Your security posture matters more, not less. Even if Mythos itself never touches your stack, AI-assisted vulnerability discovery means the half-life of unpatched software just got shorter. Update cadence, dependency hygiene, and 2FA stop being IT hygiene and start being existential. This is the most actionable signal in the entire story.

Don't let the news cycle decide what matters. The fact that Mythos is fading from headlines isn't evidence it didn't matter. It's evidence that collective attention is the wrong instrument for measuring what does. Build internal habits instead. A quarterly AI capability review. A standing conversation with your technical people. A short list of "what would we do if X became real" scenarios. Structure beats reaction every time.

Resist both poles. "The sky is falling" and "it's all hype" are equally lazy. The truth, almost always, is that something real happened, the framing around it was distorted, and the practical implications are quieter and more boring than either camp suggests. Quiet and boring is where the work gets done.

The takeaway

Mythos is a tool. A genuinely impressive one. Not a god, not a hoax. The right posture is the one that's been right for every major technology since the printing press. Take it seriously, ask better questions than the discourse is asking, and keep building.

The leaders who do well in this era won't be the ones who reacted hardest. They'll be the ones still thinking clearly about it in upcoming months.

Peter Mercado