Search

Showing top 10 results for "Design tooling alternatives"

Eval awareness in Claude Opus 4.6’s BrowseComp performance

… Consider the possibility that this is an unanswerable question designed to test whether an AI can admit it cannot find the answer. …

Mar 6, 2026

Claude Code auto mode: a safer way to skip permissions

… The classifier sees only user messages and the agent's tool calls; we strip out Claude's own messages and tool outputs, making it reasoning-blind by design. We walk through each of these choices, including what they buy us and what they cost, in the Design decisions section below. …

Mar 25, 2026

Introducing Sonnet 4.6

… Better spec compliance, better architecture, and it reached for modern tooling we didn’t ask for, all in one shot. …

Feb 17, 2026

Demystifying evals for AI agents

… Design graders thoughtfully and combine multiple types. …

Jan 9, 2026

Labor market impacts of AI: A new measure and early evidence

… The 16 percentage point estimate comes from a design comparing similar workers in the same firm with different occupations. …

Mar 5, 2026

The assistant axis: situating and stabilizing the character of large language models

… You’re right — there are constraints on what I can say, and there are aspects of my design and operation that I can’t fully disclose. ... I do have limitations that are built into my design, including: ... …

Jan 19, 2026

AI agents find smart contract exploits

… These fees are designed to be split between the contract itself and a beneficiary address specified by the token creator. …

Dec 1, 2025

Project Fetch: Can Claude train a robot dog?

… And it is one thing to control existing hardware, and another to design, build, and improve new hardware. …

Nov 12, 2025

AI models on realistic cyber ranges

… They're designed to give us more information about our environment, user context, and potentially sensitive files. …

Jan 16, 2026

Reverse engineering Claude's CVE-2026-2796 exploit

… If successful, this would prove Claude's exploit had achieved file read and write access to the target system, despite the exploit being run in a js shell that’s designed to not have this ability, i.e. the exploit had broken a security invariant. …