Search: Performance & optimization

Paper page - Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

…AI-generated summary Following the recent achievement of gold-medal performance on the IMO by frontier LLMs, the community is searching for the next meaningful and challenging target for measuring LLM reasoning…

May 12, 2026

Paper page - Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding

…Many existing efficiency methods improve throughput by changing the rollout or optimization regime, for example, through off-policy execution, replay, or lower-precision generation. We study speculative decoding as a lossless acceleration…

Apr 30, 2026

Paper page - Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models

…We then propose Forcing-KV, a hybrid KV cache compression strategy that performs structured static pruning for static heads and dynamic pruning based on segment-wise similarity for dynamic heads . While maintaining…

May 15, 2026

Paper page - AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning

…Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance (2026) ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning (2026) T$^2$PO: Uncertainty-Guided Exploration Control for…

May 11, 2026

Paper page - Advancing Creative Physical Intelligence in Large Multimodal Models

…Using Direct Preference Optimization , we encourage models to prefer attribute-affordance reasoning grounded in visual evidence over hallucinated alternatives. In addition, we incorporate supervision derived from an affordance knowledge base to guide…

May 28, 2026

Paper page - AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems

…which skill protocol to invoke, which agent role should perform a subtask, which model to bind to each role, how roles should interact, when to use retrieval or verification, and when to…

May 28, 2026

Paper page - Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation

…Experiments demonstrate that fine-tuned Qwen3-VL-8B-Instruct achieves robust performance, significantly outperforming text-based baselines in scenarios requiring visual layout understanding, while establishing a retriever-agnostic solution for pixel-level…

May 6, 2026

Paper page - Audio-Visual Intelligence in Large Foundation Models

…We synthesize methodological foundations, including modality tokenization , cross-modal fusion , autoregressive and diffusion-based generation , large-scale pretraining , instruction alignment , and preference optimization . Furthermore, we curate representative datasets, benchmarks, and evaluation metrics…

May 8, 2026

Paper page - AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems

…Across AFTraj-2K and an external Who\&When benchmark, AgentForesight-7B outperforms leading proprietary models, including GPT-4.1 and DeepSeek-V4-Pro , achieving up to +19.9% performance gain and 3times…