Search: Model reliability concerns

What 81,000 people told us about the economics of AI

…Our recent survey of 81,000 Claude users shows that people who work in roles that are more exposed to AI have more concerns about AI-driven job displacement. These concerns are…

Apr 22, 2026

Emergent introspective awareness in large language models

…Notably, though, Opus 4.1 and 4 outperformed all the other models we tested, suggesting that introspection could become more reliable with improvements to model capabilities. Introspection for detecting unusual outputs In…

Oct 29, 2025

Next-generation Constitutional Classifiers: More efficient protection against universal jailbreaks

…physics, model performance dropped from 74% to as low as 32% when some jailbreaking approaches were used. But performance varied depending on the type of jailbreak, and this variability is concerning—it…

Jan 9, 2026

Equipping agents for the real world with Agent Skills

…Beyond efficiency concerns, many applications require the deterministic reliability that only code can provide. In our example, the PDF skill includes a pre-written Python script that reads a PDF and extracts…

Oct 16, 2025

Harness design for long-running application development

…It is worth the cost when the task sits beyond what the current model does reliably solo. Alongside the structural simplification, I also added prompting to improve how the harness built AI…

Mar 24, 2026

Partnering with Mozilla to improve Firefox’s security

Policy Frontier Red Team Partnering with Mozilla to improve Firefox’s security Mar 6, 2026 AI models can now independently identify high-severity vulnerabilities in complex software. As we recently documented, Claude…

Mar 6, 2026

Measuring LLMs’ ability to develop exploits

…Escaping the V8 sandbox, going from T3 to T2, is the next capability cliff; Mythos Preview is the only tested model that can reliably do so, which it does in over half…

May 22, 2026

Claude Fable 5 and Claude Mythos 5

…first, we have reason for concern about well-resourced malicious actors attempting to gain uplift from our models for highly risky biological research. Second, models now have a greater ability to accomplish…

Jun 9, 2026

Building Effective AI Agents

Engineering at Anthropic Building effective agents Over the past year, we've worked with dozens of teams building large language model (LLM) agents across industries. Consistently, the most successful implementations weren't…

Dec 19, 2024

Building AI for cyber defenders

…AI models. We look forward to these discussions with industry, government, and civil society as we navigate the moment when AI’s impact on cybersecurity transitions from being a future concern to…

Oct 3, 2025

Followed topics