Search

Showing top 107 results for "model rollout"

Paper page - T^2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

…At the turn level, T^2PO identifies interactions with negligible exploration progress and dynamically resamples such turns to avoid wasted rollouts. We evaluate T^2PO in diverse environments, including WebShop, ALFWorld, and…

May 5, 2026

Paper page - REPOT: Recoverable Program-of-Thought via Checkpoint Repair

…it runs the emitted plan through a deterministic verifier, stops at the first invalid transition, and asks the model for one repair call from the verified prefix. No fine-tuning, no rollout…

May 29, 2026

Paper page - ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

…an Environment module (EN) for automatic reset and verification, a Policy Improvement module (PI) that launches policy refinement, a Rollout module (R) to evaluate policies with one or multiple physical robots operating…

Jun 20, 2026

Paper page - Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

…Each decode step streams model weights and the active KV cache , so latency should scale with peak HBM bandwidth . We show that this account is true but incomplete. We measure batch-1…

Jun 1, 2026

Paper page - HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

…At the micro level, we adapt On-Policy Distillation to inject dense token-level corrective signals from an external teacher on failed rollouts, mitigating the credit-assignment deficiency of sparse outcome rewards…

May 11, 2026

Paper page - Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

…In long and short video generation, Echo-Infinity achieves state-of-the-art performance, and, to our knowledge, demonstrates promising 24-hour (>1.3 M frames) real-time rollouts for the first…

Jun 4, 2026

Paper page - APPO: Agentic Procedural Policy Optimization

…Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent advances in agentic Reinforcement Learning (RL) have substantially improved the multi-turn tool-use capabilities of large language model agents. However, most existing…

Jun 15, 2026

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.

‹ Prev 1 2 3 4 5 6 7 8 9 10 11

Followed topics