Paving the way for agents in biology
…What Karpathy’s lecture about web development tells us about doing biology with AI agents This mismatch between agent needs and human-built tools is not unique to biology. The same friction…
Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T
Teaching Claude why…What Karpathy’s lecture about web development tells us about doing biology with AI agents This mismatch between agent needs and human-built tools is not unique to biology. The same friction…
…The premise Most scientists currently using AI agents work in a conversational loop, managing each step of the process on a tight leash. As models have become significantly better at long-horizon…
…Opportunities for engagement on AI safety Anthropic supports international AI safety dialogue with AI experts in China, when possible. The world has a vested interest in safe AI, regardless of where it…
…Public Policy focuses on the areas where Anthropic has defined priorities and perspectives, including model safety and transparency , energy ratepayer protections , infrastructure investments , export controls , and democratic leadership in AI . Sarah Heck…
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
…For governance, observability, and the rest of the stack, see NIST's project on AI agent identity and authorization , the six-agency guidance on adopting agentic AI led by Australia's ACSC…
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
…6 Finally, in a world where larger fractions of economic activity are autonomously managed by AI agents, odd scenarios like this could have cascading effects—especially if multiple agents based on similar…
…Two scenarios for global AI leadership Our views on the AI competition between the US and China. Teaching Claude why New research on how we've reduced agentic misalignment. Natural Language Autoencoders…