Paper page - Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs
… Whereas olympiad-style problems measure step-by-step reasoning alone, research-level problems use such reasoning to advance the frontier of mathematical knowledge itself, emerging as a compelling alternative. …