Search

Showing top 10 results for "LLM-powered engineering"

Building Effective AI Agents

Engineering at Anthropic Building effective agents Over the past year, we've worked with dozens of teams building large language model LLM agents across industries. …

Dec 19, 2024

Partnering with Mozilla to improve Firefox’s security

… The Firefox team highlighted three components of our submissions that were key for trusting our results: Accompanying minimal test cases Detailed proofs-of-concept Candidate patches We strongly encourage researchers who use LLM-powered vulnerability research tools to include similar evidence of ver… …

Mar 6, 2026

Demystifying evals for AI agents

… LLMs have progressed from 40% to 80% on this eval in just one year. …

Jan 9, 2026

Vibe physics: The AI grad student

… LLMs are profoundly creative. They simply lack a sense of which paths might be fruitful before walking them. I think we can distill what is missing in current LLMs to a single word: Taste . …

Mar 23, 2026

Introducing Claude Corps

… Claude Corps fellows will help strengthen our predictive underwriting models and improve the accuracy of our LLM-powered survey tools. …

Jun 11, 2026

Advancing Claude in healthcare and the life sciences

… We chose Claude, powered by Anthropic, for the strength of its model and its reputation for responsible AI . …

Jan 11, 2026

Harness design for long-running application development

… The separation doesn't immediately eliminate that leniency on its own; the evaluator is still an LLM that is inclined to be generous towards LLM-generated outputs. …

Mar 24, 2026

Core views on AI safety: When, why, what, and how

… We are also trying to get a more detailed understanding of large language model LLM training procedures. LLMs have demonstrated a variety of surprising emergent behaviors, from creativity to self-preservation to deception. …

Mar 8, 2023

Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench

… Despite how badly we want to use these models for science, no agentic science benchmark has become quite as canonical as SWE-bench is for software engineering. …

Apr 29, 2026

Paving the way for agents in biology

… Footnotes Biomni Open Source Biomni OSS refers to the open-source version of Biomni https://github.com/snap-stanford/Biomni , v0.0.8 with Claude Sonnet 4 as the underlying LLM. …

Jun 8, 2026

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.

Followed topics