Followed topics

Search

Showing top 65 results for "AI agent safety"

All sources anthropic.com 65

People also ask

Why does agentic misalignment happen?

Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T

Teaching Claude why

Trustworthy agents in practice

Policy Trustworthy agents in practice Apr 9, 2026 AI “agents” represent the latest major shift in how people and organizations are using AI. A couple of years ago, AI models were only…

Introducing Claude Opus 4.5

…Claude Opus 4.5 represents a breakthrough in self-improving AI agents . For automation of office tasks, our agents were able to autonomously refine their own capabilities—achieving peak performance in 4…

Core views on AI safety: When, why, what, and how

Announcements Core views on AI safety: When, why, what, and how Mar 8, 2023 We founded Anthropic because we believe the impact of AI might be comparable to that of the industrial…

Measuring AI agent autonomy in practice

Societal Impacts Measuring AI agent autonomy in practice Feb 18, 2026 AI agents are here, and already they’re being deployed across contexts that vary widely in consequence, from email triage to…

Anthropic acquires Stainless

Announcements Anthropic acquires Stainless May 18, 2026 The frontier of AI is shifting from models that answer to agents that act—and agents are only as capable as the systems they can…

Redeploying Claude Fable 5

…2 A shared standard for judging the severity of a given jailbreak would help AI developers triage new findings as they arise, launch highly capable models with greater safety, and communicate the…

Focus areas for The Anthropic Institute

…Economic diffusion Threats and resilience AI systems in the wild AI-driven R&D In Core Views on AI Safety , we wrote that doing effective safety research required close contact with frontier…

Claude Science, an AI workbench for scientists

…The team is now working with domain experts to further refine the AI-based critic agents. And Stephen Francis, an associate professor and epidemiologist at the UCSF Brain Tumor Center, has used…

Anthropic and NEC partner to build AI-native engineering at scale in Japan

…the potential of AI in the Japanese market,” said Toshifumi Yoshizaki, Executive Officer and COO of NEC Corporation. “Together, we aim to create solutions that meet the high safety, reliability, and quality…

Anthropic opens Seoul office and announces new partnerships across the Korean AI ecosystem

…Anthropic will provide Claude access to up to 60 NAIRL-affiliated researchers, supporting work on AI safety, model evaluation, alignment, robustness, and broader frontier AI research. In the nonprofit sector, Good Neighbors…