Paper page - Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning
…AI-generated summary On-policy self-distillation has become a strong recipe for LLM reasoning, where a privileged teacher supervises the student's own rollouts while conditioning on the reference solution. A…