Followed topics

Search

Showing top 1 result for "Benchmarks and reliability"

Anthropic Launches Claude Opus 4.8 With Gains in Coding and Honesty

… Anthropic benchmarks indicate Opus 4.8 scored a 69.2% on SWE-Bench Pro, outperforming GPT–5.5 and Gemini 3.1 Pro on the test and several other benchmarks, though GPT–5.5 leads on the terminal-coding benchmark. …

May 28, 2026 · Juli Clover