Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision | NVIDIA Technical Blog
…Extending FP8 for KV cache and attention With a transformer model, linear layers are not the only bottleneck. KV cache growth and attention computation often dominate the end-to-end rollout time…
