How people ask Claude for personal guidance
…Two scenarios for global AI leadership Our views on the AI competition between the US and China. Teaching Claude why New research on how we've reduced agentic misalignment. Natural Language Autoencoders…
Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T
Teaching Claude why…Two scenarios for global AI leadership Our views on the AI competition between the US and China. Teaching Claude why New research on how we've reduced agentic misalignment. Natural Language Autoencoders…
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
…Related content Coding agents in the social sciences Results from a survey of 1,260 social scientists about AI and coding agent use. Project Glasswing: An initial update An early update on…
…Career uncertainty and adaptation Many engineers describe their role shifting from writing code to managing AIs. Engineers increasingly see themselves as “manager[s] of AI agents”—some already “constantly have at least…
…Related content Making Claude a chemist Coding agents in the social sciences Results from a survey of 1,260 social scientists about AI and coding agent use. Project Glasswing: An initial update…
…The same problem shows up with AI. Despite how badly we want to use these models for science, no agentic science benchmark has become quite as canonical as SWE-bench is for…
To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.