Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy | NVIDIA Technical Blog
…On a single NVIDIA Blackwell DGX B200 GPU, AutoDeploy performed on par with the manually optimized baseline in TensorRT LLM (Figure 4). It delivered up to 350 tokens per second per user…
