Post-Training Quantization of LLMs with NVIDIA NeMo and NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog
…8B and 70B We evaluated TensorRT-LLM engine performance and accuracy using the benchmark.py and mmlu.py scripts, respectively. The following results were obtained for NVIDIA H100 80GB GPUs with TensorRT…