Eval awareness in Claude Opus 4.6’s BrowseComp performance
… Consider the possibility that this is an unanswerable question designed to test whether an AI can admit it cannot find the answer. …
… Consider the possibility that this is an unanswerable question designed to test whether an AI can admit it cannot find the answer. …
… The classifier sees only user messages and the agent's tool calls; we strip out Claude's own messages and tool outputs, making it reasoning-blind by design. We walk through each of these choices, including what they buy us and what they cost, in the Design decisions section below. …
… Better spec compliance, better architecture, and it reached for modern tooling we didn’t ask for, all in one shot. …
… And it is one thing to control existing hardware, and another to design, build, and improve new hardware. …
… They're designed to give us more information about our environment, user context, and potentially sensitive files. …
… If successful, this would prove Claude's exploit had achieved file read and write access to the target system, despite the exploit being run in a js shell that’s designed to not have this ability, i.e. the exploit had broken a security invariant. …
… For net new classifiers 1 , implemented via our privacy-preserving tooling , our validation process was as follows. We designed multiple potential measures to capture concepts such as task complexity. …