Inside NVIDIA Groq 3 LPX: The Low-Latency Inference Accelerator for the NVIDIA Vera Rubin Platform | NVIDIA Technical Blog
… In the Vera Rubin platform architecture with LPX, decode is best thought of as a two-engine loop. GPUs handle decode work that benefits most from throughput and large memory capacity, such as full-context attention over the accumulated KV cache. …