Search

Showing top 130 results for "AI safety defenses"

All sources techcrunch.com 14 wired.com 10 theverge.com 10 huggingface.co 9 blog.google 8 anthropic.com 8 techpowerup.com 5 theregister.com 5 cnet.com 5 arstechnica.com 4 xda-developers.com 4 spectrum.ieee.org 3

Videos

Anthropic, Google, Microsoft paid AI bug bounties – quietly

…It could have been a lot worse Claude Code bypasses safety rule if given too many commands GitHub backs down, kills Copilot pull-request ads after backlash AI supply chain attacks don…

Apr 15, 2026 · Jessica Lyons

TechCrunch Mobility: Lime's IPO gamble | TechCrunch

Welcome back to TechCrunch Mobility, your hub for the future of transportation and now, more than ever, how AI is playing a part. To get this in your inbox, sign up here…

May 10, 2026 · Kirsten Korosec

Trustworthy agents in practice

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Apr 9, 2026

US-Regierung erzwingt Abschaltung von Anthropics KI Fable 5 und Mythos 5

…Die zuvor kommunizierten Schutzmaßnahmen seien in einer Vorabprüfung über Tausende Stunden Red-Teaming getestet worden – gemeinsam mit der US-Regierung, dem britischen AI Safety Institute (UK AISI), privaten Organisationen und internen Teams…

Jun 13, 2026 · Dr. Volker Zota

Discussions and forums

r/netsec · u/unknownhad · May 10, 2026

The compression of the exploit timeline: Why n-day gaps and 90-day embargoes are failing in practice.

The traditional vulnerability disclosure timeline relies on a fundamental assumption: exploit development and vulnerability discovery take time. Over the last 12 months the integration of LLMs into offensive tooling has …

Hacker News · u/dk189 · 1w ago

Show HN: We post-trained a model that pen tests instead of refusing

Anthropic and OpenAI's publicly available models are explicitly guard-railed so that they refuse offensive tasks. And their cyber-focussed models are gated for enterprises. This leaves SMEs and mid market open to major v…

91 40

r/Android · u/MishaalRahman · May 12, 2026

New features, emojis, & security improvements: Here’s everything new coming to Android!

Hi Reddit, We just wrapped up The Android Show | I/O Edition, and a core theme of the show was how we’re making your phone more helpful so that you can spend less time looking at it and more time living your life. To mak…

Big Tech’s Gulf megaprojects are trapped between two war choke points

…chip suppliers. “The security frameworks underpinning the U.S.-UAE AI partnership appear to have focused on supply chain control and geopolitical alignment, not on physical defense during high-intensity conflict,” Ali…

Mar 4, 2026 · Indranil Ghosh

Next-generation Constitutional Classifiers: More efficient protection against universal jailbreaks

…Nevertheless, no AI systems currently on the market have perfectly robust defenses. Last year, we described a new approach to defend against jailbreaks which we called “ Constitutional Classifiers :” safeguards that monitor model…

Jan 9, 2026

Anthropic says Claude may want to see your ID | TechCrunch

Anthropic may ask Claude users to verify their age and identity by uploading their government-issued documents, according to a new version of the company’s privacy policy. The AI giant says…

Jun 22, 2026 · Zack Whittaker

Infrastructure Archives

…to improve deployment safety Learn how Github uses eBPF to detect and prevent circular dependencies in its deployment tooling. When protections outlive their purpose: A lesson on managing defense systems at scale…

Apr 16, 2026 · Lawrence Gripper

Is Peter Thiel the target of Pope Leo's Gandalf quote? An investigation.

…And the technology that could best help break this cultural stagnation is AI. Therefore, we should take the guardrails off AI, despite the risks. I still think we should be trying AI…

May 26, 2026 · Nate Anderson

Mapping AI-enabled cyber threats: Insights from the LLM ATT&CK Navigator

…T1562 (Impair Defenses). 54.8% of the threat actors studied used AI to bypass, disable, or tamper endpoint security tools. T1055 (Process Injection). 30.3% of actors used AI to write malicious…

Jun 3, 2026

Followed topics