Search: real-time coding

Introducing Claude Opus 4.7

… In our own testing, the net effect is favorable—token usage across all effort levels is improved on an internal coding evaluation, as shown below—but we recommend measuring the difference on real traffic. …

Apr 16, 2026

Introducing Claude Design by Anthropic Labs

… You can refine the system over time, and teams can maintain more than one. …

Apr 17, 2026

Claude Code auto mode: a safer way to skip permissions

… Routine coding e.g. editing source files in your repo doesn't pay classifier latency; in-project edits are reviewable via version control. Only actions with real downside potential reach the final tier: Tier 3: Transcript classifier. …

Mar 25, 2026

Introducing Claude Opus 4.8

… On coding tasks, this effort level spends a similar number of tokens as Opus 4.7’s default, but with better performance. …

May 28, 2026

Introducing Sonnet 4.6

… At the time, we wrote that it was “still experimental—at times cumbersome and error-prone,” but we expected rapid improvement. …

Feb 17, 2026

From shortcuts to sabotage: natural emergent misalignment from reward hacking

… From shortcuts to sabotage In our latest study, we used a realistic setup to study the unintended consequences that could arise from reward hacking: We start from a pretrained model and mix into its continued pretraining data some realistic documents describing possible ways to reward hack during p… …

Nov 21, 2025

Agents for financial services

… The new connectors are: Dun & Bradstreet , which provides the global standard for verified business identity and helps enterprises connect systems of record and scale AI-enabled workflows; Fiscal AI , which extends real-time fundamentals coverage across public equities for deeper research and bench… …

May 5, 2026

Demystifying evals for AI agents

… An overview of approaches for understanding AI agent performance Method Pros Cons Automated evals Running tests programmatically without real users Faster iteration Fully reproducible No user impact Can run on every commit Tests scenarios at scale without requiring a prod deployment Requires more u… …

Jan 9, 2026

Coding agents in the social sciences

… We find no evidence that coding agent users are submitting more new papers to journals or resubmitting papers more quickly. This could reflect the timeline of getting a paper to submission, as coding agent use is a recent phenomenon. …

May 27, 2026

Introducing Claude Opus 4.5

… Claude Opus 4.5 handles long-horizon coding tasks more efficiently than any model we’ve tested . It achieves higher pass rates on held-out tests while using up to 65% fewer tokens, giving developers real cost control without sacrificing quality. …

Nov 24, 2025

Followed topics