What is the impact of segment size on node availability?
To study the importance of setting --segment appropriately, one can use a simplified mathematical model that demonstrates the impact of the segment size on the effective available cluster capacity for a given job. Administrators need to be aware of how segment size can affect node availability. You can also observe the impact of --segment=9: the expected usable capacity degrades quickly as the node unavailability rate λ increases, since having only a single unavailable node means the domain can only contribute nine nodes for jobs using --segment=9. Whereas for --segment=16, a domain will cont
Topology-aware job scheduling allows a job scheduler such as Slurm to make resource allocation decisions based on the cluster’s physical network layout, such as the hierarchy of switches and racks. The scheduler should preserve locality, keeping workloads within the same NVLink domain whenever possible. In addition, because multiple training or inference jobs can fit in a group of NVL72 racks, the scheduler must provide efficient bin-packing to avoid resource fragmentation. The longstanding Slurm topology/tree plugin provides topology-aware scheduling for large clusters, but its best-effort ap
How do cluster segmentation and job scheduling work on GB200 NVL72?
As clusters grow in scale and complexity, managing GPU resources becomes critical for achieving both high utilization and predictable performance. The GB200 NVL72 system introduces larger AI job segment sizes and fine-grained scheduling control, enabling operators to align segment configurations with workload needs. Together with GB200 NVL72-aware scheduling extensions in the Slurm workload manager, this approach balances large and small jobs to maximize efficiency even in the presence of hardware faults.
What is the best job scheduling approach for GB200 NVL72?
Based on our simulation results and performance testing, we recommend a scheduling approach for NVIDIA GB200 NVL72 clusters that prioritizes large job performance while maintaining high utilization. Large jobs of 64 GPUs or more should be given access to the maximum number of NVLink domains, using segment sizing to ensure proportional GPU allocation across domains. Segment-based scheduling is essential for aligning resources with workload patterns. For jobs of 32 nodes or more, a segment size of 16 is recommended if the application can benefit from it, while smaller jobs are better suited to s