Search: Performance & optimization

Paper page - Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

…To put it directly, RL fundamentally optimizes the recall of latent knowledge. 2️⃣ The unexpected contribution of 0/128 samples: Remarkably, ~83% of the performance jump is driven by training on the…

May 13, 2026

Paper page - GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

…These developments lead to strong performance in multimodal coding, visual tool use , and framework-based agentic tasks, while preserving competitive text-only coding capability. More importantly, our development process offers practical insights…

Apr 30, 2026

Paper page - When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

…Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning (2026) Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes (2026) Self-Induced Outcome Potential: Turn-Level Credit Assignment…

May 7, 2026

Paper page - Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

…The queries are optimized end-to-end with the video diffusion transformers ( DiTs ), forming an evolving memory that supports arbitrary compression ratios with constant computation independent of video length. They also act…

Jun 4, 2026

Paper page - iVGR: Internalizing Visually Grounded Reasoning for MLLMs with Reinforcement Learning

…In this work, we empirically find that mandating explicit object boxes in visually grounded CoT during inference often degrades performance compared to standard textual CoT, which reasons without explicit visual grounding. We…

Jun 1, 2026

Paper page - Confidence-Adaptive SwiGLU for Mixture-of-Experts

…Comparable Performance on End-Side Devices (2026) $\phi$-Balancing for Mixture-of-Experts Training (2026) Post-Trained MoE Can Skip Half Experts via Self-Distillation (2026) DOT-MoE: Differentiable Optimal Transport for…

Jun 2, 2026

Paper page - RewardHarness: Self-Evolving Agentic Post-Training

…We present RewardHarness, a self-evolving agentic reward framework that reframes reward modeling as context evolution rather than weight optimization. Instead of learning from large-scale annotations, RewardHarness aligns with human preferences…

May 14, 2026

Paper page - CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization

…Yu-Hsi Chen , Abstract CPCANet is a domain generalization framework that uses Common Principal Component Analysis to discover structured domain-invariant subspaces through differentiable neural layers, achieving state-of-the-art performance…

May 11, 2026

Paper page - Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

…To prevent cheating, we propose CapReward, a reward design based on the CapCode principle to discourage optimization beyond the cap. Experiments across multiple datasets show that CapCode detects cheating while preserving performance…

Jun 10, 2026

Paper page - DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language Models

…Zhuoming Liu , , , , , Abstract DRIFT is a framework that adapts pretrained vision-language models for continuous decoding tasks by combining coarse prediction with iterative refinement through flow matching, improving performance across perception and…

Jun 11, 2026

Followed topics

Search