Deploying Disaggregated LLM Inference Workloads on Kubernetes | NVIDIA Technical Blog
…nvidia.com/gpu: "1" Router (a standard deployment—no leader-worker topology needed): apiVersion: apps/v1 kind: Deployment metadata: name: router spec: replicas: 2 selector: matchLabels: app: router template: metadata: labels: app…