Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron | NVIDIA Technical Blog
…Muon training performance on NVIDIA GB300 NVL72 Table 1 summarizes training throughput of the Kimi K2 and Qwen3 30B models with Muon and the AdamW optimizer on the NVIDIA GB300 NVL72 system…