Introducing Claude Opus 4.7
… In our own testing, the net effect is favorable—token usage across all effort levels is improved on an internal coding evaluation, as shown below—but we recommend measuring the difference on real traffic. …
… In our own testing, the net effect is favorable—token usage across all effort levels is improved on an internal coding evaluation, as shown below—but we recommend measuring the difference on real traffic. …
… You can refine the system over time, and teams can maintain more than one. …
… Routine coding e.g. editing source files in your repo doesn't pay classifier latency; in-project edits are reviewable via version control. Only actions with real downside potential reach the final tier: Tier 3: Transcript classifier. …
… On coding tasks, this effort level spends a similar number of tokens as Opus 4.7’s default, but with better performance. …
… At the time, we wrote that it was “still experimental—at times cumbersome and error-prone,” but we expected rapid improvement. …
… From shortcuts to sabotage In our latest study, we used a realistic setup to study the unintended consequences that could arise from reward hacking: We start from a pretrained model and mix into its continued pretraining data some realistic documents describing possible ways to reward hack during p… …
… The new connectors are: Dun & Bradstreet , which provides the global standard for verified business identity and helps enterprises connect systems of record and scale AI-enabled workflows; Fiscal AI , which extends real-time fundamentals coverage across public equities for deeper research and bench… …
… An overview of approaches for understanding AI agent performance Method Pros Cons Automated evals Running tests programmatically without real users Faster iteration Fully reproducible No user impact Can run on every commit Tests scenarios at scale without requiring a prod deployment Requires more u… …
… We find no evidence that coding agent users are submitting more new papers to journals or resubmitting papers more quickly. This could reflect the timeline of getting a paper to submission, as coding agent use is a recent phenomenon. …
… Claude Opus 4.5 handles long-horizon coding tasks more efficiently than any model we’ve tested . It achieves higher pass rates on held-out tests while using up to 65% fewer tokens, giving developers real cost control without sacrificing quality. …