Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core | NVIDIA Technical Blog
…It dynamically selects the CP size per microbatch to efficiently handle variable-length sequences, achieving up to 1.48x speedup on real-world datasets. In large-scale model training, an often-overlooked…