Search: Reasoning research

Paper page - Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

… Whereas olympiad-style problems measure step-by-step reasoning alone, research-level problems use such reasoning to advance the frontier of mathematical knowledge itself, emerging as a compelling alternative. …

May 12, 2026

Paper page - LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

… However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, leaving much of the computation-allocation space unexplored. …

May 11, 2026

Paper page - Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?

… This question naturally arises when building deep research systems. We revisit it by pairing BM25 with frontier LLMs that have better reasoning and tool-use abilities. …

May 12, 2026

Paper page - WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors

… We evaluate generated videos with a human-aligned two-part methodology: Process-aware Reasoning Verification uses structured QA and reasoning-phase diagnostics to detect temporal and causal failures, while Multi-dimensional Quality Assessment scores reasoning quality, temporal consistency, and visu… …

May 12, 2026

Paper page - AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive

… To achieve this, we propose a systematic framework with two key components: 1 LLMConfig-Gym , a multi-fidelity environment encompassing four critical LLM experiment tasks, supported by over one million GPU hours of verifiable experiment outcomes; 2 A structured training pipeline that formulates con… …

May 13, 2026

Paper page - LychSim: A Controllable and Interactive Simulation Framework for Vision Research

… View arXiv page View PDF Project page GitHub 10 Add to collection Community LychSim is a highly controllable, interactive simulation framework built on Unreal Engine 5, designed to lower the technical barrier of using a modern game engine for computer vision research. …

May 13, 2026

Paper page - PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning

… PlantMarkerBench provides a challenging and reproducible evaluation framework for literature-grounded biological evidence attribution and supports future research on trustworthy scientific information extraction and AI-assisted plant biology. …

May 12, 2026

Welcome GPT OSS, the new open-source model family from OpenAI!

… That’s exactly what developers and researchers need if these models are going to be experimented with, fine-tuned, and embedded into real products rather than living only in benchmarks. It’s also refreshing to see clear guidance around reasoning traces, chat templates, and evaluation pitfalls. …

May 1, 2026 · Vaibhav Srivastav

Paper page - Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

Papers arxiv:2605.10781 Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR Published on May 11 Submitted by JeonghyeKim on May 12 Microsoft Research Authors: , , , Abstract RLRT enhances self-distillation by reinforcing successful student decisions that… …

May 12, 2026

Paper page - From Web to Pixels: Bringing Agentic Search into Visual Perception

Papers arxiv:2605.12497 From Web to Pixels: Bringing Agentic Search into Visual Perception Published on May 12 Submitted by taesiri on May 13 Authors: , , , , , Abstract Researchers introduce WebEye, a benchmark for object localization requiring external knowledge resolution, and Pixel-Searcher, an… …

May 13, 2026

Followed topics

Paper page - Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Paper page - LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Paper page - Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?

Paper page - WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors

Paper page - AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive

Paper page - LychSim: A Controllable and Interactive Simulation Framework for Vision Research

Paper page - PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning

Welcome GPT OSS, the new open-source model family from OpenAI!

Paper page - Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

Paper page - From Web to Pixels: Bringing Agentic Search into Visual Perception