Project Glasswing: what Mythos showed us
…Context - Coding agents are tuned for one focused stream of work: building a feature, fixing a bug, writing a refactor. They ingest a lot of source code, hold a single hypothesis at…
Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T
Teaching Claude why…Context - Coding agents are tuned for one focused stream of work: building a feature, fixing a bug, writing a refactor. They ingest a lot of source code, hold a single hypothesis at…
…Related content Teaching Claude why New research on how we've reduced agentic misalignment. Donating our open-source alignment tool Focus areas for The Anthropic Institute At The Anthropic Institute (TAI), we…
…Finally, the model also shows significant improvement in agentic safety, meaning it's a lot better at recognizing and refusing prompt injection attacks when you're using it as an agent. Opus…
…With Claude Cowork and Managed Agents embedded inside it, KPMG professionals and their clients can build new AI capabilities directly in the platform—work that used to mean jumping between tools, chat…
…This is the reason this company was founded as a nonprofit focused on safety, and where things were being obscured in a way that credible people around this found it less than…
…built with respect for the unique goals, opportunities, and challenges of the region.” Our initial focus will be supporting our enterprise, startup, and research customers. Anthropic already works with some of Australia…
…However, keep in mind that this feature is currently in research preview, and Anthropic is still working on agent safety. The feature is also exclusive to Claude's paid plans for now…
Science Long-running Claude for scientific computing Mar 23, 2026 In this post, Siddharth Mishra-Sharma , a researcher on the Discovery team, explains how to apply multi-day agentic coding workflows—test…
…Similarly, the general cost of consumer electronics is increasing as chip manufacturers and production lines shift their focus to building more AI capacity. The largest consumer electronics manufacturer in the world, Apple…
…1 The second approach to capping the blast radius—and the focus of much of this post—is containment. Rather than supervising what the agent does, we supervise what it’s able…