Paper page - Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration
…when all sampled rollouts for a query fail, the relative advantage collapses to zero. Consequently, the model loses effective training signals for these questions, wasting the training data and computational budget. While…