Paper page - Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning
…This motivates treating teacher exposure not as a fixed hyperparameter but as a learnable training-time control variable. We therefore propose Adaptive Teacher Exposure for Self-Distillation (ATESD). ATESD models the reveal…