Post-Training Quantization of LLMs with NVIDIA NeMo and NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog
… Calibrating the model to obtain scaling factors for lower-precision GEMMs and exporting the quantized model to the TensorRT-LLM checkpoint . …