Paper page - Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs
…Yet research-level math benchmarks remain scarce because such problems are difficult to source (e.g., Riemann Bench and FrontierMath-Tier 4 contain 25 and 50 problems, respectively). To support reliable evaluation…
