Search: Tooling comparisons

Donating our open-source alignment tool

…We’ve now integrated Petri with our other open-source alignment tool, Bloom , which can perform much more in-depth assessments of specific chosen behaviors (in comparison to Petri’s wider-ranging…

May 7, 2026

Coding agents in the social sciences

…Claude Code is the most common coding agent tool reported, with 86% of users reporting Claude Code use (31% report using Codex, the next most common tool). Adoption is highly uneven Figure…

May 27, 2026

Introducing Sonnet 4.6

… Better spec compliance, better architecture, and it reached for modern tooling we didn’t ask for, all in one shot. …

Feb 17, 2026

A “diff” tool for AI: Finding behavioral differences in new models

…By building a generic diff tool for AI models, we can stop searching for a needle in a haystack, and instead let the comparison automatically point us to potentially dangerous behavioral differences…

Mar 13, 2026

How AI Is Transforming Work at Anthropic

… This provides a baseline for team-specific comparisons. …

Dec 2, 2025

Building AI for cyber defenders

… These enable clear comparisons across models, measure the speed of AI progress, and—especially in the case of novel, externally developed evaluations—provide a good metric to ensure that we are not simply teaching to our own tests. …

Oct 3, 2025

Agentic coding and persistent returns to expertise

…Over those seven months, the value of the typical task, which we estimate through a comparison to freelance job postings, rose in almost every kind of work—about 25% on average. Introduction…

Jun 16, 2026

Measuring AI agent autonomy in practice

…an agent is an AI system equipped with tools that allow it to take actions , like running code, calling external APIs, and sending messages to other agents. 1 Studying the tools that…

Feb 18, 2026

Demystifying evals for AI agents

…tool_calls required: - {tool: read_file, params: {path: "src/auth/*"}} - {tool: edit_file} - {tool: run_tests} tracked_metrics: - type: transcript metrics: - n_turns - n_toolcalls - n_total_tokens - type: latency metrics: - time…

Jan 9, 2026

Vibe physics: The AI grad student

…I’ve been working with modern machine learning tools for over a decade. My first modern ML paper , from 2016, was an early application of deep learning to particle physics. In a…

Mar 23, 2026

Followed topics