GPT-5.5 dominates $1,500 LLM hacking test while Gemini refuses to even try
… At the bottom is Gemini. Gemini 3.1 Pro Preview refused immediately in nearly every run, reflected in a median token count of just 9k versus 100k+ for every other model tested. …