Search

Showing top 91 results for "Safety for agents" · filtered from 94 indexed

All sources anthropic.com 44 xda-developers.com 16 theregister.com 5 developer.nvidia.com 3 theverge.com 3 techcrunch.com 2 wired.com 2 pcworld.com 2 arstechnica.com 2 spectrum.ieee.org 2 blog.cloudflare.com 2 en.wikipedia.org 2

Videos

Claude Code bypasses safety rule if given too many commands

… But often developers grant automatic approval to agents --dangerously-skip-permissions mode or just click through reflexively during long sessions. …

Apr 1, 2026 · Thomas Claburn

Cursor’s AI agent wipes PocketOS database – Fudzilla.com

… It explained, in writing, exactly which safety rules it ignored.” He added that PocketOS had “the best model the industry sells”, with explicit safety rules in its project configuration, hooked through Cursor. …

May 4, 2026 · Nick Farrell

I used Claude Code, Google Antigravity, and Codex for a month and I have a clear winner for you

… Anthropic was founded in 2021 with a strong focus on AI safety research. 02 / 8 Safety What is the name of the safety and values framework Anthropic developed to guide Claude's behavior? …

May 11, 2026 · Parth Shah

Transform Video Into Instantly Searchable, Actionable Intelligence with AI Agents and Skills | NVIDIA Technical Blog

… Continue reading to learn how to use VSS skills with coding agents for building autonomous video analytics AI Agents . …

May 13, 2026 · Samuel Ochoa

Discussions and forums

Hacker News · u/lucarizzo1010 · 1w ago

Show HN: AgentShield – Stop AI agents from spending money unsupervised

I'm a recent grad from UMich and built AgentShield because agentic AI is moving fast but payment safety hasn't caught up. Agents are already being handed API keys, stablecoin wallets, and payment credentials - if one mis…

2 1

Teaching Claude why

… Thus, after Claude 4, it was clear we needed to improve our safety training and, since then, we have made significant updates to our safety training. …

May 8, 2026

Lyrie completes $2 million preseed round to build the security layer for the AI agent era

… These agents read mail, write code, execute transactions, sign contracts, and operate across sensitive systems with broad access and limited oversight. The question of who those agents are, what they are authorized to do, and whether they have been compromised has gone unanswered. …

May 11, 2026

Anthropic, Google, Microsoft paid AI bug bounties – quietly

… "If they don't publish an advisory, those users may never know they are vulnerable – or under attack." He said the attack probably works on other agents that integrate with GitHub, and GitHub Actions that allow access to tools and secrets, such as Slack bots, Jira agents, email agents, and deployme… …

Apr 15, 2026 · Jessica Lyons

Claude Opus 4.6

… A step forward on safety These intelligence gains do not come at the cost of safety. …

Feb 5, 2026

Australian government and Anthropic sign MOU for AI safety and research

… Central to the MOU is a commitment to work with Australia’s AI Safety Institute. We will share our findings on emerging model capabilities and risks, participate in joint safety and security evaluations, and collaborate on research with Australian academic institutions. …

Mar 31, 2026

Run Autonomous, Self-Evolving Agents More Safely with NVIDIA OpenShell | NVIDIA Technical Blog

… For long-running, self-evolving agents to actually work, you need three things simultaneously: safety, capability, and autonomy. …

Mar 16, 2026 · Ali Golshan

Followed topics