Paper page - Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
… AI-generated summary Diffusion large language models dLLMs offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. …