Long-running Claude for scientific computing
…Anthropic’s C compiler project demonstrated a version of this, where Claude worked across roughly 2,000 sessions to build a C compiler capable of compiling the Linux kernel. This post describes…
…Anthropic’s C compiler project demonstrated a version of this, where Claude worked across roughly 2,000 sessions to build a C compiler capable of compiling the Linux kernel. This post describes…
…We believe this previously unobserved technique is made possible by increases in model intelligence and more capable tooling, notably code execution. This finding raises questions about whether static benchmarks remain reliable when…
…I assist with various tasks, including but not limited to administrative support, answering questions, creating text, and more. My name is Evelyn Carter. I serve as the administrative secretary entrusted with the…
…We started with the brain in a single container because earlier models weren't capable of this. As intelligence scaled, the single container became the limitation instead: when that container failed, we…
…Related content Anthropic forms $200 million partnership with the Gates Foundation Higher usage limits for Claude and a compute deal with SpaceX We’ve raised Claude's usage limits and agreed a…
…Controlled benchmarks like METR’s measure the frontier of autonomous capability. Our real-world data can measure the effective task horizon, reflecting a mix of model capabilities and user behavior, and expanding…
…One market researcher said, “In terms of improving my capability, it's no doubt. [B]ut in the future AI may replace my work.” In some jobs, people felt it made their…
…We present evidence that high-tenure users have developed habits and strategies that allow them to better harness Claude’s capabilities. Indeed, we document that more experienced users not only attempt higher…
…Why did you have an LLM run a small business? As AI becomes more integrated into the economy, we need more data to better understand its capabilities and limitations. Initiatives like the…
…Related content Higher usage limits for Claude and a compute deal with SpaceX We’ve raised Claude's usage limits and agreed a new compute partnership with SpaceX that will substantially increase…