Using NVFP4 Low-Precision Model Training for Higher Throughput Without Losing Accuracy | NVIDIA Technical Blog
…Stable training requires keeping some layers in BF16, particularly near the end of the network, to mitigate NVFP4 quantization error. In these experiments, maintaining the final four transformer layers in BF16 proved…