NVIDIA Dynamo
…Independent benchmarks show that GB300 NVL72 combined with NVIDIA Dynamo improves mixture-of-experts (MoE) model throughput by up to 50x compared to NVIDIA Hopper™-based systems. The GB300 NVL72 connects 72…
SLMs are well-positioned for the agentic era because they use a narrow slice of LLM functionality for any single language model errand. LLMs are built to be powerful generalists, but most agents use only a very narrow subset of their capabilities. They typically parse commands, generate structured outputs such as JSON for tool calls, or produce summaries and answer contextualized questions. These tasks are repetitive (up to the differences in prompt payloads), predictable, and highly specialized—well within the scope of specialized SLMs. An LLM trained to handle open-domain conversations is o
How Small Language Models Are Key to Scalable Agentic AI | NVIDIA Technical Blog…Independent benchmarks show that GB300 NVL72 combined with NVIDIA Dynamo improves mixture-of-experts (MoE) model throughput by up to 50x compared to NVIDIA Hopper™-based systems. The GB300 NVL72 connects 72…
…We provide a privileged DaemonSet, snapshot-agent , installable through a Helm chart. An agent runs on every node and handles checkpoint and restore for runc -managed containers without requiring modifications to runc…
…Across six architecture generations, NVIDIA has improved inference throughput per megawatt by 1,000,000x (Figure 1). To put this in perspective, if the average fuel efficiency of a car had improved…
…Whether you’re dealing with retrieval-augmented generation (RAG) pipelines, agentic AI workflows, or long-form content generation, the \(O(N^2)\) complexity of attention remains a primary bottleneck. This post explains…
…How mixed prefill and decode scheduling improve GPU utilization While kernel-level optimizations improve individual operation latency, significant efficiency gains can be achieved at the scheduler level by optimizing aggregated serving (prefill…
…Stability and correctness improvements to OMM, which accelerate ray-traced alpha-tested geometry such as foliage and vegetation Substrate material improvements: Compatibility updates between Unreal Engine 5 Substrate material framework and NvRTX…
…In the style of Karpathy’s autoresearch , an agentic harness can propose a nontrivial code change, rebuild Dynamo, rerun the same trace, and keep only changes that improve the objective. That turns…
…For example, throughput comparisons across various models show improvements when using TensorRT for RTX versus DirectML, as measured on an NVIDIA GeForce RTX 5090 GPU. TensorRT for RTX is only compatible with…
…As millions of users, agents, and devices demand access to intelligence, the challenge is shifting from peak training throughput to delivering deterministic inference at scale—predictable latency, jitter, and sustainable token economics…
…These applications include AI assistants , customer support agents, coding co-pilots, and “deep research” assistants. Recent advances in algorithmic and model efficiency have reduced the cost of training and inference , as demonstrated…