DynoSim: Simulating the Pareto Frontier | NVIDIA Technical Blog
… G2 offload is disabled, so the difference comes from routing and cache placement: KVBM manages KV blocks across the serving memory hierarchy: local HBM, host memory, SSD, and distributed or remote cache. …