Paper page - The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes
…We find that OPD on mathematical reasoning is highly sensitive to teacher choice and loss formulation, whereas OPSD fails in our tested settings due to test-time absence of instance-specific privileged…
