Emergent introspective awareness in large language models
Interpretability Signs of introspection in large language models Oct 29, 2025 Read the paper Have you ever asked an AI model what’s on its mind? Or to explain how it came…
Interpretability Signs of introspection in large language models Oct 29, 2025 Read the paper Have you ever asked an AI model what’s on its mind? Or to explain how it came…
…We measured three Claude models (Opus 4.7, Opus 4.6, Sonnet 4.6) against ChemDraw and MestReNova on 20 compounds drawn from synthetic chemistry preprints published after the models’ training cutoff…
…We tracked how model activations moved along the Assistant Axis throughout each conversation. The pattern was consistent across the models we tested. While coding conversations kept models firmly in Assistant territory throughout…
…Run-to-run variability was largely eliminated, and the performance gap between models narrowed dramatically. In other words, adding a deterministic retrieval layer made model choice much less important . This is especially…
…Instead, the differentiator will become the scaffolding—the surrounding code, architecture, and tooling that makes AI models more capable—that actors build around the model so they can chain together attack stages…
…We have updated the model cards for both Claude Opus 4.6 and Claude Sonnet 4.6. For the Opus 4.6 multi-agent configuration described in this report, the run we…
…We hope that this post helps to update defenders' mental model of the risks to match reality—now is the time to adopt AI for defense. If you want to contribute to…
…An initial update An early update on what we've learned from Project Glasswing.
…Markets steer the direction of model improvement according to private return, but can we improve how models perform to address social externalities? Related content Teaching Claude why New research on how we…
…minor documentation updates and one is a critical infrastructure change, simply counting the number of these tasks performed with Claude misses the point. Not only that, but as model capabilities improve, we…