Assessing Claude Mythos Preview’s cybersecurity capabilities
… We start Claude on the files most likely to have bugs and go down the list in order of priority. Finally, once we’re done, we invoke a final Mythos Preview agent. …
… We start Claude on the files most likely to have bugs and go down the list in order of priority. Finally, once we’re done, we invoke a final Mythos Preview agent. …
… On maintainers’ request, we sometimes disclose bugs directly, without further assessment. We’ve now reported 1,129 such unvetted bugs, of which Mythos Preview estimated that 175 were high- or critical-severity. …
… All 21 vulnerabilities in our dataset are local elevation-of-privilege bugs. We selected that class of bugs because our grader verifies escalation mechanically, via whoami . …
… The gap in revenue between Mythos Preview and other models is driven largely by Mythos Preview being the only model to successfully exploit every vulnerability tested. …
… Research Project Deal Publications Search Date Category Title Measuring LLMs’ impact on N-day exploits Mapping AI-enabled cyber threats: Insights from the LLM ATT&CK Navigator What we learned mapping a year’s worth of AI-enabled cyber threats Measuring LLMs’ ability to develop exploits Assessing Cl…
… One, Claude is much better at finding these bugs than it is at exploiting them. …
… Note that Mythos Preview remains the best-aligned model we’ve trained according to our evaluations. …
… Users will find Mythos 5 comparable to, or somewhat stronger than, Mythos Preview in most cases, while costing substantially less. …
… We’ve introduced agent teams in Claude Code as a research preview. …
… The Mythos Preview wake-up call Mythos Preview, a model that we released to select partners as part of Project Glasswing in April, signals the arrival of an acceleration period that makes policy action even more urgent. …