Bringing AI Closer to the Edge and On-Device with Gemma 4 | NVIDIA Technical Blog
… Using vLLM high-throughput LLM serving on DGX Spark provides a high-performance platform for the largest Gemma 4 models; the vLLM for Inference DGX Spark playbook provides the details to get vLLM running with Gemma 4 on your DGX Spark. …