Maximizing GPU Utilization with NVIDIA Run:ai and NVIDIA NIM | NVIDIA Technical Blog
… When a NIM operates its request, the unused headroom between the request and limit remains available to co-located workloads. When concurrent traffic spikes occur, the NIM bursts toward its limit, claiming that memory and converting it into active throughput. …