Claude Opus 4.6
…safety profile as good as, or better than, any other frontier model in the industry, with low rates of misaligned behavior across safety evaluations. In Claude Code , you can now assemble agent…
Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T
Teaching Claude why…safety profile as good as, or better than, any other frontier model in the industry, with low rates of misaligned behavior across safety evaluations. In Claude Code , you can now assemble agent…
…The collaboration focuses on three areas of highest leverage: agentic technology build, AI-native deal-making, and reinvention of the enterprise function. PwC is launching a new finance business group (Office of…
…released its new Claude Opus 4.8 model, which touts better capabilities in agentic tasks, advanced coding, and focus on honesty and self-correction. The AI startup is also reportedly planning to…
…Our safety researchers concluded that Sonnet 4.6 has “a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes…
…Open protocols also keep competition focused on the quality and safety of the agent, rather than on who controls the integrations. None of these measures replace the work that model developers have…
…agentic variant Claude Code. Anthropic places a strong emphasis on AI safety in its model design. Not this time. Claude Code was built by Anthropic, a company focused on AI safety research…
…Our agenda focuses on four areas for research: Economic diffusion Threats and resilience AI systems in the wild AI-driven R&D In Core Views on AI Safety , we wrote that doing…
…longer-running agents and new ways to use Claude in Excel, Chrome, and on desktop. In the Claude apps, lengthy conversations no longer hit a wall. See our product-focused section below…
…We keep an internal incident log focused on agentic misbehaviors. Past examples include deleting remote git branches from a misinterpreted instruction, uploading an engineer's GitHub auth token to an internal compute…
…Anthropic was founded in 2021 with a strong focus on AI safety research. 02 / 8 Safety What is the name of the safety and values framework Anthropic developed to guide Claude's…