Paper page - PREPING: Building Agent Memory without Tasks
…View arXiv page View PDF Project page GitHub 2 Add to collection Community LLM agents often need memory to solve tasks in new tool environments, but memory is usually built only after…
…View arXiv page View PDF Project page GitHub 2 Add to collection Community LLM agents often need memory to solve tasks in new tool environments, but memory is usually built only after…
…Process-Reward Optimization for Computer Use Agents (2026) UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization (2026) OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis (2026…
…A Hierarchical Benchmark for Visual Website Development with Agent Verification (2026) WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing (2026) Test-Driven AI Agent Definition (TDAD): Compiling Tool…
…Recent alternatives include agentic reasoning through code or tool calls, and latent reasoning with learnable hidden embeddings. However, agentic methods incur context-switching latency from external execution, while latent methods lack task…
…Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents (2026) ARIADNE: Agentic Reward-Informed Adaptive Decision Exploration via Blackboard-Driven MCTS for Competitive Program Generation (2026) LLM as a Tool…
…Measuring Reward Hacking in Long-Horizon Coding Agents (2026) Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use (2026) Do Synthetic Trajectories Reflect Real Reward Hacking? A Systematic Study on…
…Which is, I think, why the interpretable traces are the most durable contribution here — not as the agent's own verdict, but as the surface an external check (a human, a tool…
…developing Claw-style personal agents with synthetic training data, verified workspaces, and benchmark evaluation. AI-generated summary Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states…
…trajectory-level rewards verify final correctness but provide limited guidance on which intermediate reasoning steps or tool interactions contribute to the outcome. The difficulty is especially pronounced in multi-turn search agents…
…While recent multimodal deep search agents attempt to address this issue by utilizing external tools, the visual-native search paradigm remains underexplored. Existing methods primarily rely on simple images with explicit semantics…