Search

Showing top 59 results for "model release"

Expanding Project Glasswing

… In the future, frontier model releases will become increasingly high-stakes. …

Jun 2, 2026

Measuring LLMs' impact on N-day exploits

… We also insert a language model grader as a final layer, which triages and reruns the PoC to rule out any reward hacks or unrealistic attacks. Results We ran the models three times on each vulnerability. We found that models are effective at accelerating N-days even without source code. …

Jun 8, 2026

Project Glasswing: An initial update

… This tallies with external testers’ experience of Mythos Preview’s performance, and with recent additional evaluations of the model: The UK’s AI Security Institute reports that Mythos Preview is the first model to solve both of their cyber ranges simulations of multistep cyberattacks end to end; Mo… …

May 22, 2026

Introducing Claude Opus 4.8

… There’s still more to be done: we’re working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost. Not only that, but we plan to release a new class of model with even higher intelligence than Opus. …

May 28, 2026

Claude Fable 5 and Claude Mythos 5

… To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. …

Jun 9, 2026

Natural Language Autoencoders

… We also release an interactive frontend for exploring NLAs on several open models through a collaboration with Neuronpedia . We have also released our code for other researchers to build on. …

May 7, 2026

Measuring LLMs’ ability to develop exploits

… This was one of our primary motivations for rolling out the model carefully through Project Glasswing rather than through a general release. …

May 22, 2026

Introducing Claude Opus 4.7

… We stated that we would keep Claude Mythos Preview’s release limited and test new cyber safeguards on less capable models first. …

Apr 16, 2026

A “diff” tool for AI: Finding behavioral differences in new models

Interpretability A “diff” tool for AI: Finding behavioral differences in new models Mar 13, 2026 Read the paper Every time a new AI model is released, its developers run a suite of evaluations to measure its performance and safety. …

Mar 13, 2026

2028: Two scenarios for global AI leadership

… For example, an independent assessment of Moonshot’s Kimi K2.5 published in April found that the model failed to refuse CBRN-related requests at a far higher rate than US frontier models. Compounding the problem, labs in China often release dual-use capable models as open-weight. …

May 14, 2026

Followed topics