Paper page - T^2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning
…Despite advances in stabilization techniques such as fine-grained credit assignment and trajectory filtering , instability remains pervasive and often leads to training collapse. We argue that this instability stems from inefficient exploration…