Vibe physics: The AI grad student
…Eventually, GPT solved the integral and Claude incorporated it. They needed each other, and I needed both of them. Oversimplifying the code When I gave Claude Code the implementation guide for NNLL…
…Eventually, GPT solved the integral and Claude incorporated it. They needed each other, and I needed both of them. Oversimplifying the code When I gave Claude Code the implementation guide for NNLL…
…automation-friendly workflows like code generation and data processing. Office & Administrative tasks are also more prevalent in the API (15% vs. 8%), reflecting routine business operations suited to delegation. Claude.ai, by…
…They require reading papers, querying databases, running experiments, coding and analysis. Now that models can do many of these things, benchmarks have evolved to reflect these workflows. BLADE tasks a model with…
…This is a very challenging task for Claude, given that Claude receives only the title and description of the JIRA tickets, while the human developers have full context on the codebase and…
…For example, Claude Code is a flexible agent harness, and we used its core primitives through the Agent SDK to build our long-running agent harness . An evaluation suite is a collection…
…In line with other data showing that Claude is extensively used for coding, Computer Programmers are at the top, with 75% coverage, followed by Customer Service Representatives, whose main tasks we increasingly…
To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.