Paper page - Policy and World Modeling Co-Training for Language Agents
… These results suggest that standard RL rollouts are a practical source of WM supervision for language-agent training. …
… These results suggest that standard RL rollouts are a practical source of WM supervision for language-agent training. …
… From these signals, it builds runnable mock Android apps backed by read-only app content and mutable state, then derives executable tasks , rule-based verifiers , and training rollouts from the same environments. …
… This addresses the prefix mismatch of offline distillation , but early student rollouts can still be poor, placing teacher supervision on weak or low-quality prefixes. …