Search: agent cost control

Paper page - Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

…We train a lightweight sampling controller with reinforcement learning (RL) to jointly balance answer correctness, latency, and computation cost. At each round, the controller decides to stop sampling or to acquire additional…

Jun 3, 2026

Paper page - LLM Agents Can See Code Repositories

…Visualization is most useful during fault localization and when the agent autonomously controls exploration depth . These findings point to a practical hybrid text-and-vision design for next-generation coding agents. View…

Jun 15, 2026

Paper page - On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

…improving agent safety comes at the cost of degraded task performance . Such sparse and single-objective rewards severely limit real-world usability. To bridge this gap, we propose FATE, an on-policy…

Paper page - WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction

…a Long-Horizon Memory Environment for LLM Agents (2026) MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents (2026) When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent…

May 29, 2026

Paper page - ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

…This closed-loop system transforms real-world manipulation learning into a controllable optimization procedure, minimizing human effort while allowing fair ablations across training recipe and agent variants. Powered by ENPIRE, frontier coding…

Jun 20, 2026

Paper page - EgoCS-400K: An Egocentric Gameplay Dataset for World Models

…By connecting visual observations with human actions, camera motion, game states, and events at scale, EgoCS-400K serves as a practical bridge between passive web videos, controllable game simulation, and costly real…

Jun 17, 2026

Paper page - PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems

…A strong coding agent today can probably generate data by inspecting a schema, but PIPE-Cypher makes this scalable, cost-effective, and repeatable when the schema inevitably changes. By constraining this as…

Jun 9, 2026

Paper page - Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization

…Work focuses on improving the efficiency and robustness of distributed black box optimization in multi-agent systems. Potential applications include cooperative sensing, resource allocation, and distributed control, which may contribute to more…

May 4, 2026

Paper page - The Price of Anarchy in Disaggregated Inference

…controller to optimize routing and reduce latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Disaggregated inference architectures physically separate prefill and decode phase s onto distinct GPU pools , creating competing "agents…