Why is inference optimization important for AI factories?
Inference drives revenue, so it is the key workload to optimize. When operators increase inference throughput per watt, they directly increase the number of tokens they can sell or insights they can create. This also translates to additional revenue per unit of time. At the hundred megawatt to gigawatt scale, even a few percentage points of throughput improvement per megawatt can translate into meaningful gains in profit. Model architecture is also important. Mixture-of-experts (MoE) models are typically more energy efficient per unit of intelligence compared to dense models with similar total
How does NVIDIA DSX optimize AI factory performance?
The ML.ENERGY Initiative has developed a leaderboard and benchmark for sharing observations from their measurements and a reasoning framework that explains why they observe certain energy behaviors. These benchmarks can be tied into energy aware operations- telemetry-driven systems that show how to run an AI factory under real deployment constraints, including power cost, carbon intensity, thermals, cooling capacity, and grid limits. NVIDIA DSX provides these energy-aware operations. The platform delivers a coordinated view across compute, racks, cooling, facility power, and workload schedulin