The Many Aspects of Inference Performance
… To illustrate the impact of software optimization on cost per token : since February, MI355X GPU cost per token has dropped significantly, while GB300 NVL72 remains higher and unchanged Figure 2 . Figure 2: Cost per million tokens over time, at interactivity 100 TPS/user -- DeepSeek R1, FP8, no MTP. …