Search

Showing top 15 results for "scale speculation"

모델 양자화: NVIDIA Model Optimizer로 구현하는 학습 후 양자화(PTQ)

…양자화, 디스틸레이션, 프루닝, 추측 디코딩(speculative decoding), 희소화(sparsity) 등이 핵심 기법에 해당합니다. ModelOpt는 Hugging Face, PyTorch, ONNX 포맷의 모델을 입력으로 받으며, 다양한 최적화 기법을 자유롭게 조합해 최적화된 체크포인트를 산출할 수 있도록…

May 20, 2026 · Ruixiang Wang

Build Personal AI Agents on Windows PCs with New Tools from Microsoft and NVIDIA | NVIDIA Technical Blog

…An advanced speculative decoding technique, where a smaller draft model proposes several tokens ahead that the target model verifies in a single forward pass, delivering faster throughput at identical output quality. MTP…

Jun 2, 2026 · Annamalai Chockalingam

Build Accelerated, Differentiable Computational Physics Code for AI with NVIDIA Warp | NVIDIA Technical Blog

…large-scale multibody dynamics. The Warp backend reaches up to 252x (locomotion) and 475x (manipulation) speedups over JAX on comparable hardware. MJWarp gets there by exploiting sparse matrix operations and speculative execution…

Mar 12, 2026 · Sheel Nidhan

NVIDIA Groq 3 LPX: Vera Rubin 플랫폼 저지연 추론 가속기 완전 분석

…Seven Chips, Five Rack-Scale Systems, One AI Supercomputer 기술 블로그 : Announcing NVIDIA Dynamo 1.0: Scaling MultiNode Inference in Production 비디오: The Future of AI Inference – Explainer on Attention-FFN Disaggregation…

Apr 3, 2026 · Kyle Aubrey

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog

…TensorRT Model Optimizer streamlines applying these techniques at scale, turning state-of-the-art LLMs into deployable, cost-effective solutions. How to prune a model using TensorRT Model Optimizer This section walks…

Oct 7, 2025 · Max Xu

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.

Followed topics

모델 양자화: NVIDIA Model Optimizer로 구현하는 학습 후 양자화(PTQ)

Build Personal AI Agents on Windows PCs with New Tools from Microsoft and NVIDIA | NVIDIA Technical Blog

Build Accelerated, Differentiable Computational Physics Code for AI with NVIDIA Warp | NVIDIA Technical Blog

NVIDIA Groq 3 LPX: Vera Rubin 플랫폼 저지연 추론 가속기 완전 분석

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog