Harness design for long-running application development
…planner agent that took a simple 1-4 sentence prompt and expanded it into a full product spec. I prompted it to be ambitious about scope and to stay focused on product…
Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T
Teaching Claude why…planner agent that took a simple 1-4 sentence prompt and expanded it into a full product spec. I prompted it to be ambitious about scope and to stay focused on product…
…It has developed a range of large language models (LLMs) named Claude and focuses on AI safety . [ 7 ] Anthropic was founded in 2021 by former members of OpenAI , including siblings Daniela Amodei…
…Global health and life sciences The largest part of our partnership will focus on improving health outcomes in low- and middle-income countries, where around 4.6 billion people lack access to…
…Together, they form an always-on first line of defense—allowing our enforcement to focus on actual misuse without hindering the millions of ordinary conversations happening every day. To measure how well…
…Although these benchmarks were developed in the “chatbot” era, they’ve persisted into the agent and tool-use era, joined by even more difficult scientific reasoning evals like FrontierScience and Humanity's…
To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.