Model Quantization: Post-Training Quantization Using NVIDIA Model Optimizer | NVIDIA Technical Blog
… Export and deploy : Once the accuracy is acceptable, the fake quantized weights are compressed into their true low-precision form and exported as a checkpoint for downstream engines. In our case, we export the PyTorch checkpoint to ONNX and run inference with TensorRT. …