Introducing Sonnet 4.6
… Sonnet 4.6 brings much-improved coding skills to more of our users. Improvements in consistency, instruction following, and more have made developers with early access prefer Sonnet 4.6 to its predecessor by a wide margin. …
… Sonnet 4.6 brings much-improved coding skills to more of our users. Improvements in consistency, instruction following, and more have made developers with early access prefer Sonnet 4.6 to its predecessor by a wide margin. …
… On our 93-task coding benchmark, Claude Opus 4.7 lifted resolution by 13% over Opus 4.6, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve. Combined with faster median latency and strict instruction following, it’s particularly meaningful for complex, long-running coding workflows. …
… 01 / 11 One of the most prominent improvements in Opus 4.8 is its honesty . …
… The speed improvements are remarkable. …
… This extra overhead pays off when the token savings, latency improvements, and accuracy gains are substantial. …
… In combination with other prompt changes, it hurt coding quality and was reverted on April 20. …
…build these verifiers for their own codebases; the key point is that giving the agent a reliable way to check both of these properties dramatically improves the quality of its output. We…
… We took the AARs’ two highest-performing methods on a dataset of chat tasks and applied them to math and coding tasks. …
…Higher usage limits The following three changes—all effective today—are aimed at improving the experience of using Claude for our most dedicated customers. First, we’re doubling Claude Code’s five…
… This can make results deceptive, as large capability improvements appear as small increases in scores. For example, the code review startup Qodo was initially unimpressed by Opus 4.5 because their one-shot coding evals didn’t capture the gains on longer, more complex tasks. …