Win on TCO: How AMD Instinct™ MI355X Achieves Cost-Competitive Distributed Inference Through SGLang with MoRI
… Two-Batch Overlap TBO hides this latency by interleaving communication and compute across two micro-batches: MicroBatch A dispatch sends quantized tokens over the network on a dedicated communication stream While network transfer is in flight, MicroBatch B attention computes on the main compute str… …