Introducing Sonnet 4.6
…Sonnet 4.6 outperforms on our orchestration evals, handles our most complex agentic workloads, and keeps improving the higher you push the effort settings. Claude Sonnet 4.6 is a notable improvement…
…Sonnet 4.6 outperforms on our orchestration evals, handles our most complex agentic workloads, and keeps improving the higher you push the effort settings. Claude Sonnet 4.6 is a notable improvement…
…I think the models are still improving at a very steady pace, and so we should be able to keep sharing those with our users. I think the deployments might look a…
…I think the agent trace link is broken — it points to trace.md but the actual file is agent-trace.txt: https://huggingface.co/hf-skills/h100-diffusers-kernel-builder/blob/main…
…Our new integration gives you more control over your agent sandboxes, secures connections to private services, and improves observability. In the past year, Cloudflare’s Developer Platform has expanded to give more…
The engineering practices Claude Code and Codex use to improve AI agents
Multi Agent Continuous Context Harness - MACCHA solves the problem that every AI coding session starts from zero. It combines a file-based 7-tier context architecture with a working memory engine (Memanto) that features …
I have been interested in long-horizon coding tasks for a while, especially with benchmarks like FrontierSWE, where even the best coding agents like Codex and Claude Code struggle to complete tasks.These agents come with…
Data is “the new oil” for AI.What if you could “plug in” to an oil well, and get royalties forever whenever that well’s oil was used?Right now, the people who build those datasets get paid once, if at all. There's no rec…
Claw-Coder is an AI agent that runs locally on your laptop and has access to powerful tools instead of configuring claude or codex to use a local model just use claw-coder. Why was claw-coder created? Answer: To solve th…
…is that giving the agent a reliable way to check both of these properties dramatically improves the quality of its output. We can’t guarantee that all agent-generated patches that pass…
…Parallel narrow tasks beat one exhaustive agent - Coverage improves when many agents work on tightly scoped questions and we deduplicate the results afterward, rather than asking one agent to be exhaustive. Each…
Engineering at Anthropic Introducing advanced tool use on the Claude Developer Platform The future of AI agents is one where models work seamlessly across hundreds or thousands of tools. An IDE assistant…
…Apple worked with Anthropic and OpenAI to configure their agents for use in Xcode and to ensure... Claude Sonnet 4.6 Brings Improved Coding, Computer Use, and Office Tasks Anthropic today updated…
…15,200 bases, maximum 1,900 ambiguous characters (N’s), exclude lab-passaged samples.” When agents were left to solve these queries on their own, performance varied widely across systems and improved…
Apple's engineers are going to attend a to multi-week AI vibecoding bootcamp, which in theory will help improve Siri. Apple will allegedly organize an AI coding bootcamp for its…