Followed topics

Search

Showing top 1 result for "GPU for academic LLMs"

Filtered by topic: LLMs Clear ✕

Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron | NVIDIA Technical Blog

… Reduce-scatter gradient : A reduce-scatter is performed over all gradients and each GPU gets a portion of gradients corresponding to the parameters it “owns.” Local updates : Each GPU updates only the specific portion of the model parameters it “owns.” AllGather parameters : After the update, GPUs … …

Apr 22, 2026 · Hao Wu