Search

Showing top 68 results for "AI agent safety"

People also ask

Why does agentic misalignment happen?

Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T

Teaching Claude why

Paving the way for agents in biology

…What Karpathy’s lecture about web development tells us about doing biology with AI agents This mismatch between agent needs and human-built tools is not unique to biology. The same friction…

Jun 8, 2026

Long-running Claude for scientific computing

…The premise Most scientists currently using AI agents work in a conversational loop, managing each step of the process on a tight leash. As models have become significantly better at long-horizon…

Mar 23, 2026

2028: Two scenarios for global AI leadership

…Opportunities for engagement on AI safety Anthropic supports international AI safety dialogue with AI experts in China, when possible. The world has a vested interest in safe AI, regardless of where it…

May 14, 2026

Introducing The Anthropic Institute

…Public Policy focuses on the areas where Anthropic has defined priorities and perspectives, including model safety and transparency , energy ratepayer protections , infrastructure investments , export controls , and democratic leadership in AI . Sarah Heck…

Mar 11, 2026

Sydney will become Anthropic’s fourth office in Asia-Pacific

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Mar 10, 2026

How we contain Claude across products

…For governance, observability, and the rest of the stack, see NIST's project on AI agent identity and authorization , the six-agency guidance on adopting agentic AI led by Australia's ACSC…

May 25, 2026

Building a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

May 4, 2026

Anthropic’s Long-Term Benefit Trust appoints Vas Narasimhan to Board of Directors

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Apr 14, 2026

Project Vend: Can Claude run a small shop? (And why does that matter?)

…6 Finally, in a world where larger fractions of economic activity are autonomously managed by AI agents, odd scenarios like this could have cascading effects—especially if multiple agents based on similar…

Jun 27, 2025

Announcing the Anthropic Economic Index Survey

…Two scenarios for global AI leadership Our views on the AI competition between the US and China. Teaching Claude why New research on how we've reduced agentic misalignment. Natural Language Autoencoders…

Apr 22, 2026

Followed topics

People also ask

Paving the way for agents in biology

Long-running Claude for scientific computing

2028: Two scenarios for global AI leadership

Introducing The Anthropic Institute

Sydney will become Anthropic’s fourth office in Asia-Pacific

How we contain Claude across products

Building a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Anthropic’s Long-Term Benefit Trust appoints Vas Narasimhan to Board of Directors

Project Vend: Can Claude run a small shop? (And why does that matter?)

Announcing the Anthropic Economic Index Survey