Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog
…This post explains model pruning and knowledge distillation, how they work, and how you can easily apply them to your own models to achieve optimal performance using NVIDIA TensorRT Model Optimizer . What…