Eval awareness in Claude Opus 4.6’s BrowseComp performance
… This suggests that the model has an implicit understanding of what benchmark questions look like. The combination of extreme specificity, obscure personal content, and multi-constraint structure seems to be recognizable to the model as evaluation-shaped. …