Removing the Guesswork from Disaggregated Serving | NVIDIA Technical Blog
… HiSim also aids HiCache architecture exploration and cost/performance optimization through three-level KV cache design e.g., L2 size, prefetch/eviction policy, L3 bandwidth needs, write-through vs write-back to find the best cost–performance point. …