Removing the Guesswork from Disaggregated Serving | NVIDIA Technical Blog
…HiSim also aids HiCache architecture exploration and cost/performance optimization through three-level KV cache design (e.g., L2 size, prefetch/eviction policy, L3 bandwidth needs, write-through vs write-back) to…