Search: coding improvements

Paper page - Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization

…Empirically, we demonstrate that our approach achieves consistent improvements across a range of challenging text-based and GUI-based agent benchmarks. Code is available at https://github.com/HansenHua/EAPO-ICML26 and…

May 14, 2026

Paper page - Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

…and diverse multimodal benchmarks, improving average accuracy by +4.4 and +6.0 points over the SFT-to-RLVR baseline on 4B and 8B, respectively. Our code, data, and model checkpoints are…

May 6, 2026

Paper page - LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

…Despite the strong performance gains, coding agent based methods have high latency costs. While AgentRunbook-C advances the accuracy-latency Pareto frontier, substantial room for improvement remains. Together, these results establish LME…

May 13, 2026

Paper page - EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration

…View arXiv page View PDF Add to collection Community EverAnimate code, 480p LoRA checkpoints, minimal demo data, and training/inference scripts are now released: https://huggingface.co/epfl-vita/everanimate Resources: Code…

May 27, 2026

Paper page - A^2RD: Agentic Autoregressive Diffusion for Long Video Consistency

…A^2RD formulates long video synthesis as a closed-loop process that synthesizes and self-improves video segment-by-segment through a Retrieve--Synthesize--Refine--Update cycle. It comprises three core components…

May 11, 2026

Paper page - UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning

…Experiments show that leveraging coordination-path diversity improves performance over fixed coordination strategies while providing interpretable intermediate behaviors. The code is available at:https://github.com/AIFrontierLab/TorchUMM/tree/main/src/umm…

May 13, 2026

Paper page - SEIF: Self-Evolving Reinforcement Learning for Instruction Following

…Experiments across multiple model scales and architectures show that SEIF consistently improves instruction-following performance, suggesting strong generality. Further analyses reveal the sources of improvement and identify an effective training strategy for…

May 12, 2026

Followed topics

Search