Search

Showing top 90 results for "AI agent safety" · filtered from 93 indexed

All sources anthropic.com 42 xda-developers.com 17 theregister.com 5 developer.nvidia.com 3 theverge.com 3 techcrunch.com 2 wired.com 2 pcworld.com 2 arstechnica.com 2 spectrum.ieee.org 2 blog.cloudflare.com 2 en.wikipedia.org 2

Videos

Claude for Financial Services

…Claude 4 models outperform other frontier models as research agents across financial tasks in Vals AI's Finance Agent benchmark . When deployed by FundamentalLabs to build an Excel agent, Claude Opus 4…

Jul 15, 2025

Natural Language Autoencoders

…We’ve already applied NLAs to understand what Claude is thinking and to improve Claude’s safety and reliability. For instance: When Claude Opus 4.6 and Mythos Preview were undergoing safety…

May 7, 2026

Mozilla says 271 vulnerabilities found by Mythos have "almost no false positives"

…In our case when we’re looking for memory safety issues we have our sanitizer build of Firefox and if you make it crash you win. We point that agent off to…

May 7, 2026 · Dan Goodin

KPMG integrates Claude across its core business and workforce of more than 276,000 in strategic alliance

…With Claude Cowork and Managed Agents embedded inside it, KPMG professionals and their clients can build new AI capabilities directly in the platform—work that used to mean jumping between tools, chat…

May 19, 2026

Discussions and forums

Hacker News · u/lucarizzo1010 · 1w ago

Show HN: AgentShield – Stop AI agents from spending money unsupervised

I'm a recent grad from UMich and built AgentShield because agentic AI is moving fast but payment safety hasn't caught up. Agents are already being handed API keys, stablecoin wallets, and payment credentials - if one mis…

2 1

The AI Compute Crunch Is Here (and It's Affecting the Entire Economy)

…None of this is remotely sustainable as it currently stands. This means that the startups that are using AI agents to scale their operations are doing so at a time when AI…

Apr 24, 2026 · Jason Koebler

Project Glasswing: what Mythos showed us

…Why pointing a generic coding agent at a repo doesn't work When we first started AI-assisted vulnerability research last year, our instinct was the obvious one: point a generic coding…

May 18, 2026 · Grant Bourzikas

Ronan Farrow on Sam Altman’s “unconstrained” relationship with the truth

…There is no federal statute protecting AI company employees who disclose these kinds of safety concerns that are being aired in this piece. We have cases where Jan Leike, who was a…

Apr 16, 2026 · Nilay Patel

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Apr 6, 2026

Eval awareness in Claude Opus 4.6’s BrowseComp performance

…But agents can read URL paths, which in some cases contain hypotheses from other agent search queries embedded in the URL slugs. One agent correctly diagnosed what it was seeing: “Multiple AI…

Mar 6, 2026

The Long-Term Benefit Trust

…Paul Christiano stepped down in April 2024 to take a new role as the Head of AI Safety at the U.S. AI Safety Institute . In January 2026, Kanika Bahl stepped down…

Sep 19, 2023

Followed topics