Search: coding improvements

Introducing Sonnet 4.6

… Sonnet 4.6 brings much-improved coding skills to more of our users. Improvements in consistency, instruction following, and more have made developers with early access prefer Sonnet 4.6 to its predecessor by a wide margin. …

Feb 17, 2026

Introducing Claude Opus 4.7

… On our 93-task coding benchmark, Claude Opus 4.7 lifted resolution by 13% over Opus 4.6, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve. Combined with faster median latency and strict instruction following, it’s particularly meaningful for complex, long-running coding workflows. …

Apr 16, 2026

Introducing Claude Opus 4.8

… 01 / 11 One of the most prominent improvements in Opus 4.8 is its honesty . …

May 28, 2026

Introducing Claude Opus 4.5

… The speed improvements are remarkable. …

Nov 24, 2025

Introducing advanced tool use on the Claude Developer Platform

… This extra overhead pays off when the token savings, latency improvements, and accuracy gains are substantial. …

Nov 24, 2025

An update on recent Claude Code quality reports

… In combination with other prompt changes, it hurt coding quality and was reverted on April 20. …

Apr 23, 2026

Partnering with Mozilla to improve Firefox’s security

…build these verifiers for their own codebases; the key point is that giving the agent a reliable way to check both of these properties dramatically improves the quality of its output. We…

Mar 6, 2026

Automated Alignment Researchers: Using large language models to scale scalable oversight

… We took the AARs’ two highest-performing methods on a dataset of chat tasks and applied them to math and coding tasks. …

Apr 14, 2026

Higher usage limits for Claude and a compute deal with SpaceX

…Higher usage limits The following three changes—all effective today—are aimed at improving the experience of using Claude for our most dedicated customers. First, we’re doubling Claude Code’s five…

May 6, 2026

Demystifying evals for AI agents

… This can make results deceptive, as large capability improvements appear as small increases in scores. For example, the code review startup Qodo was initially unimpressed by Opus 4.5 because their one-shot coding evals didn’t capture the gains on longer, more complex tasks. …

Jan 9, 2026

Followed topics