Paper page - Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR
…AI-generated summary Self-distillation has emerged as a powerful framework for post-training LLMs, where a teacher conditioned on extra information guides a student without it, both from the same model…
