NVIDIA DGX Spark Cluster Review: Distributed Inference on Dell, GIGABYTE, and HP
… The 200 GbE fabric cannot sustain TP’s per-token all-reduce traffic without leaving compute idle, and once the batch size is 4 or 8, PP’s bubble cost vanishes into the steady-state stream. …