Paper page - Learning Agentic Policy from Action Guidance
…Yuxiang Ji , , , , , , , , Abstract Agentic reinforcement learning for large language models leverages action data from human interactions as reference guidance to improve exploration and reduce dependence on costly supervised fine-tuning. AI-generated…
