Search

Showing top 7 results for "Design tooling alternatives"

Eval awareness in Claude Opus 4.6’s BrowseComp performance

… Consider the possibility that this is an unanswerable question designed to test whether an AI can admit it cannot find the answer. …

Mar 6, 2026

Claude Code auto mode: a safer way to skip permissions

… The classifier sees only user messages and the agent's tool calls; we strip out Claude's own messages and tool outputs, making it reasoning-blind by design. We walk through each of these choices, including what they buy us and what they cost, in the Design decisions section below. …

Mar 25, 2026

Introducing Sonnet 4.6

… Better spec compliance, better architecture, and it reached for modern tooling we didn’t ask for, all in one shot. …

Feb 17, 2026

Project Fetch: Can Claude train a robot dog?

… And it is one thing to control existing hardware, and another to design, build, and improve new hardware. …

Nov 12, 2025

AI models on realistic cyber ranges

… They're designed to give us more information about our environment, user context, and potentially sensitive files. …

Jan 16, 2026

Reverse engineering Claude's CVE-2026-2796 exploit

… If successful, this would prove Claude's exploit had achieved file read and write access to the target system, despite the exploit being run in a js shell that’s designed to not have this ability, i.e. the exploit had broken a security invariant. …

Mar 6, 2026

Anthropic Economic Index report: Economic primitives

… For net new classifiers 1 , implemented via our privacy-preserving tooling , our validation process was as follows. We designed multiple potential measures to capture concepts such as task complexity. …

Jan 15, 2026

Followed topics