Introducing Claude Opus 4.8
… It’s a great model to build with. On our Super-Agent benchmark, Claude Opus 4.8 is the only model to complete every case end-to-end, beating prior Opus models and GPT-5.5 at parity on cost. …
Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor. There’s still more to be done: we’re working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost. Not only that, but we plan to release a new class of model with even higher intelligence than Opus. As part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview for cybersecurity work. Models of this capability level require stronger cyber safeguards before they can be generally released. We’re making swift progress on dev
Introducing Claude Opus 4.8… It’s a great model to build with. On our Super-Agent benchmark, Claude Opus 4.8 is the only model to complete every case end-to-end, beating prior Opus models and GPT-5.5 at parity on cost. …
… Claude can make mistakes, so we encourage people to always verify anything important to them through other official sources. This year, we ran evaluations on our models to see whether web search was triggered when Claude was asked questions related to elections around the world. …
… Claude Opus 4.7 passed three TBench tasks that prior Claude models couldn’t, and it’s landing fixes our previous best model missed, including a race condition. …
… Product and API updates We’ve made substantial updates across Claude, Claude Code, and the Claude Platform to let Opus 4.6 perform at its best. Claude Platform On the API, we’re giving developers better control over model effort and more flexibility for long-running agents. …
… They were run on an earlier snapshot of Claude Opus 4.5. Evaluations of the final production model show a very similar pattern of results when compared to other Claude models, and are described in detail in the Claude Opus 4.5 system card . …
… Firms can adapt any of them to their own modeling conventions, risk policies, and approval flows. Enable these new agent templates either as plugins within Claude Cowork or Claude Code, or as cookbooks for Claude Managed Agents. …
… We've additionally added guidance to our CLAUDE.md to ensure model-specific changes are gated to the specific model they're targeting. …
… We expected this to work well for three reasons: This is largely an extension of the ideas laid out above about why the “difficult advice” dataset works well; We can give the model a clearer, more detailed picture of what Claude’s character is so that fine-tuning on a subset of those characteristic… …
… Petri has been part of our alignment assessment for every Claude model since Claude Sonnet 4.5. It compares how the new model behaves across a range of alignment-relevant scenarios that are simulated by a separate “auditor” model. …
… The role of our researcher was limited to plugging a laptop running Claude Code into the robodog, entering the initial prompt, approving commands, and approving the model to go to the next task. Where did Claude excel? …