Search

Showing top 34 results for "HPC/cluster hardware"

Filtered by topic: NVIDIA Clear ✕

People also ask

How do cluster segmentation and job scheduling work on GB200 NVL72?

As clusters grow in scale and complexity, managing GPU resources becomes critical for achieving both high utilization and predictable performance. The GB200 NVL72 system introduces larger AI job segment sizes and fine-grained scheduling control, enabling operators to align segment configurations with workload needs. Together with GB200 NVL72-aware scheduling extensions in the Slurm workload manager, this approach balances large and small jobs to maximize efficiency even in the presence of hardware faults.

Unlock Exascale Performance on NVIDIA GB200 NVL72 with Slurm Topology-Aware Job Scheduling | NVIDIA Technical Blog
How does NVIDIA GB200 NVL72 deliver exascale compute? 

NVIDIA GB200 NVL72 is an exascale computer in a single rack. With 72 NVIDIA Blackwell GPUs interconnected by the largest production scale-up compute fabric, NVIDIA NVLink provides 130 terabytes per second (TB/s) of low-latency GPU communication bandwidth for AI and high-performance computing (HPC) workloads. Multiple GB200 NVL72 systems combined in a cluster create hierarchical network topology with large domains of very high networking bandwidth.  An AI training job can greatly benefit from the abundant networking bandwidth offered by GB200 NVL72, when scheduled to maximize the use of NVLink

Unlock Exascale Performance on NVIDIA GB200 NVL72 with Slurm Topology-Aware Job Scheduling | NVIDIA Technical Blog