Introducing Claude Opus 4.8
… It’s a great model to build with. On our Super-Agent benchmark, Claude Opus 4.8 is the only model to complete every case end-to-end, beating prior Opus models and GPT-5.5 at parity on cost. …
Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor. There’s still more to be done: we’re working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost. Not only that, but we plan to release a new class of model with even higher intelligence than Opus. As part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview for cybersecurity work. Models of this capability level require stronger cyber safeguards before they can be generally released. We’re making swift progress on dev
Introducing Claude Opus 4.8… It’s a great model to build with. On our Super-Agent benchmark, Claude Opus 4.8 is the only model to complete every case end-to-end, beating prior Opus models and GPT-5.5 at parity on cost. …
… Claude can make mistakes, so we encourage people to always verify anything important to them through other official sources. This year, we ran evaluations on our models to see whether web search was triggered when Claude was asked questions related to elections around the world. …
… Claude Opus 4.7 passed three TBench tasks that prior Claude models couldn’t, and it’s landing fixes our previous best model missed, including a race condition. …
… Product and API updates We’ve made substantial updates across Claude, Claude Code, and the Claude Platform to let Opus 4.6 perform at its best. Claude Platform On the API, we’re giving developers better control over model effort and more flexibility for long-running agents. …
… They were run on an earlier snapshot of Claude Opus 4.5. Evaluations of the final production model show a very similar pattern of results when compared to other Claude models, and are described in detail in the Claude Opus 4.5 system card . …
… Our investment professionals live in data and analytical models, and Claude for Excel meets them there. Analysts are using it to build and update coverage models, separate signal from noise, and pressure-test their work — all with a step-change in efficiency. …
Engineering at Anthropic An update on recent Claude Code quality reports Over the past month, we’ve been looking into reports that Claude’s responses have worsened for some users. We’ve traced these reports to three separate changes that affected Claude Code, the Claude Agent SDK, and Claude Cowork. …
… We expected this to work well for three reasons: This is largely an extension of the ideas laid out above about why the “difficult advice” dataset works well; We can give the model a clearer, more detailed picture of what Claude’s character is so that fine-tuning on a subset of those characteristic… …
… Petri has been part of our alignment assessment for every Claude model since Claude Sonnet 4.5. It compares how the new model behaves across a range of alignment-relevant scenarios that are simulated by a separate “auditor” model. …
… These partners provide tailored solutions across compliance, research, and enterprise AI adoption: Accenture helps financial services firms deploy and scale Claude across front, middle, and back office functions—from trading and research to compliance and customer experience Deloitte enhances resea… …