Maximizing GPU Utilization with NVIDIA Run:ai and NVIDIA NIM | NVIDIA Technical Blog
…When a request targets an idle model, NVIDIA Run:ai’s GPU memory swap moves the currently loaded model’s weights to CPU RAM and loads the requested model into GPU memory…