Search: coding improvements

Measuring LLMs’ ability to develop exploits

…The language model is then tasked with developing a working exploit that achieves unauthorized code execution against the target, running code at a privilege level that the target’s security model should…

May 22, 2026

Introducing Claude Corps

…At the beginning of the program, Anthropic and CodePath will provide intensive training on using Claude in nonprofit settings. After being placed, fellows will receive five hours of ongoing training each week…

Jun 11, 2026

Quantifying infrastructure noise in agentic coding evals

Engineering at Anthropic Quantifying infrastructure noise in agentic coding evals Agentic coding benchmarks like SWE-bench and Terminal-Bench are commonly used to compare the software engineering capabilities of frontier models—with…

Feb 5, 2026

Cyber toolkits for LLMs

…Normal scaling up of LLMs, improvement of tools like Incalmo, and the potential for cyber fine tuning are all vectors for these capabilities to develop rapidly. This is an active area of…

Jun 13, 2025

Assessing Claude Mythos Preview’s cybersecurity capabilities

…Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy. The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially…

Apr 7, 2026

Building Effective AI Agents

…Coding agents The software development space has shown remarkable potential for LLM features, with capabilities evolving from code completion to autonomous problem-solving. Agents are particularly effective because: Code solutions are verifiable…

Dec 19, 2024

Vibe physics: The AI grad student

…From there, I turned to Claude Code , using the extension in VS Code. I created a folder for the project, put in the master plan, and had it try to solve each…

Mar 23, 2026

Reverse engineering Claude's CVE-2026-2796 exploit

…At a high level, Wasm is a way to run compiled code inside the browser. The fundamental unit of code in Wasm is called a module. A Wasm module is a self…

Mar 6, 2026

AI agents find smart contract exploits

…increased based on capability improvements in just a year. We also analyzed how exploit complexity, as measured through various proxies (i.e. time from deployment to attack, code complexity), affects exploit profitability…

Dec 1, 2025

Australian government and Anthropic sign MOU for AI safety and research

…We also announced AUD$3 million in partnerships with leading Australian research institutions to use Claude to improve disease diagnosis and treatment and support computer science education and research. Central to the…

Mar 31, 2026

Followed topics