I put Claude, ChatGPT, and Gemini through the same tasks — only one is worth the subscription
… These would test systems' reasoning, ability to deal with ambiguity, precision in following instructions, and a trap designed to force them into either honesty or hallucinations. …