Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters | NVIDIA Technical Blog
… How to get started with the GPU Usage Monitor The GPU Usage Monitor is open source under the Apache 2.0 license and available now on GitHub . …
Enterprise deployments have shown a consistent pattern: when organizations move from static GPU allocation to dynamic scheduling, cluster usage becomes far more dynamic. Over-quota resources (the shared pool beyond guaranteed quotas) become one of the most heavily utilized resource types. Teams regularly exceed their guaranteed allocations, resulting in higher GPU utilization and more compute time for researchers. This makes over-quota fairness critical. When a significant portion of cluster value comes from this shared pool, that pool needs to be divided fairly over time. The classical statel
Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare | NVIDIA Technical BlogThe GPU Usage Monitor is an open-source project that deploys a fully integrated GPU observability stack for Kubernetes. Rather than requiring SRE and platform teams to assemble and configure individual components, the GPU Usage Monitor uses DCGM Exporter, kube-state-metrics, Prometheus, and Grafana into a single deployment, complete with pre-built dashboards designed specifically for GPU-accelerated workloads. The design principle is operational simplicity. A single helm install command results in actionable GPU visibility within minutes, with no custom dashboard authoring or scrape configurat
Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters | NVIDIA Technical Blog… How to get started with the GPU Usage Monitor The GPU Usage Monitor is open source under the Apache 2.0 license and available now on GitHub . …
… The scheduler won’t allocate beyond fair share while the Vision team still has pending jobs claiming their portion. The LLM team must wait until Vision team’s over-quota usage drops. LLM team’s post-training job waits…and waits…and waits. …
… Unlike cloud environments, edge devices operate under strict memory limits, with CPU and GPU sharing constrained resources. Inefficient memory use can lead to bottlenecks, latency spikes, or system failure. …
… To get started, check out these resources: VC-6 samples Examples for VC-6 encoding and selective decoding Benchmark suite to reproduce our results with Hugging Face datasets VC-6 AI Blueprint Demo showcasing VC-6 selective decoding in vision AI pipelines Reference integration patterns for multiple … …
Data Center / Cloud Building Token‑Metered AI Services on Telco AI Factories May 21, 2026 By Waleed Badr and Amogh Dendukuri Discuss 0 Discuss 0 L T F R E AI-Generated Summary Like Dislike Telcos are building sovereign AI factories based on the NVIDIA Cloud Partner reference architecture to provide… …
Networking / Communications Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel Feb 02, 2026 By Fan Yu , Tong Liu and Kai Sun Discuss 0 Discuss 0 L T F R E AI-Generated Summary Like Dislike Hybrid-EP, an efficient Expert Parallel EP communication library, leverages … …
… Previously she worked on privacy implications of federated learning, focused on distributed training techniques and got fascinated by inefficiencies in GPU usage in research and industry settings. She established the AI Infrastructure Club and is based in Munich, Germany. …
… GTC Event: Join us at NVIDIA GTC Taipei in June where developers, researchers, and industry leaders come together to explore the future of AI, from agentic and reasoning AI to physical AI, robotics, and beyond. Get details . …
… This keeps integration straightforward and limits the bandwidth required between sensor and compute platform. …
Simulation / Modeling / Design Job Statistics with NVIDIA Data Center GPU Manager and SLURM May 13, 2019 By Scott McMillan and Michael Knox Discuss 1 Discuss 1 L T F R E AI-Generated Summary Like Dislike NVIDIA Data Center GPU Manager DCGM can be integrated with SLURM to provide job-level GPU usage… …