Search

Showing top 116 results for "agentic improvements"

NVIDIA Dynamo

…Independent benchmarks show that GB300 NVL72 combined with NVIDIA Dynamo improves mixture-of-experts (MoE) model throughput by up to 50x compared to NVIDIA Hopper™-based systems. The GB300 NVL72 connects 72…

NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes | NVIDIA Technical Blog

…We provide a privileged DaemonSet, snapshot-agent , installable through a Helm chart. An agent runs on every node and handles checkpoint and restore for runc -managed containers without requiring modifications to runc…

May 27, 2026 · Schwinn Saereesitthipitak

Scaling Token Factory Revenue and AI Efficiency by Maximizing Performance per Watt | NVIDIA Technical Blog

…Across six architecture generations, NVIDIA has improved inference throughput per megawatt by 1,000,000x (Figure 1). To put this in perspective, if the average fuel efficiency of a car had improved…

Mar 25, 2026 · Kibibi Moseley

Accelerating Long-Context Inference with Skip Softmax in NVIDIA TensorRT LLM | NVIDIA Technical Blog

…Whether you’re dealing with retrieval-augmented generation (RAG) pipelines, agentic AI workflows, or long-form content generation, the \(O(N^2)\) complexity of attention remains a primary bottleneck. This post explains…

Dec 16, 2025 · Laikh Tewari

How NVIDIA Extreme Hardware-Software Co-Design Delivered a Large Inference Boost for Sarvam AI’s Sovereign Models | NVIDIA Technical Blog

…How mixed prefill and decode scheduling improve GPU utilization While kernel-level optimizations improve individual operation latency, significant efficiency gains can be achieved at the scheduler level by optimizing aggregated serving (prefill…

Feb 18, 2026 · Utkarsh Uppal

What’s New for Game Developers in NVIDIA RTX: DLSS 4.5 for UE5 and Multilingual AI Characters | NVIDIA Technical Blog

…Stability and correctness improvements to OMM, which accelerate ray-traced alpha-tested geometry such as foliage and vegetation Substrate material improvements: Compatibility updates between Unreal Engine 5 Substrate material framework and NvRTX…

May 27, 2026 · Phillip Singh

DynoSim: Simulating the Pareto Frontier | NVIDIA Technical Blog

…In the style of Karpathy’s autoresearch , an agentic harness can propose a nontrivial code change, rebuild Dynamo, rerun the same trace, and keep only changes that improve the objective. That turns…

May 29, 2026 · Yongming Ding

Speed Up Unreal Engine NNE Inference with NVIDIA TensorRT for RTX Runtime | NVIDIA Technical Blog

…For example, throughput comparisons across various models show improvements when using TensorRT for RTX versus DirectML, as measured on an NVIDIA GeForce RTX 5090 GPU. TensorRT for RTX is only compatible with…

Apr 30, 2026 · Homam Bahnassi

Building the AI Grid with NVIDIA: Orchestrating Intelligence Everywhere | NVIDIA Technical Blog

…As millions of users, agents, and devices demand access to intelligence, the challenge is shifting from peak training throughput to delivering deterministic inference at scale—predictable latency, jitter, and sustainable token economics…

Mar 17, 2026 · Sree Sankar

LLM Inference Benchmarking: How Much Does Your LLM Inference Cost? | NVIDIA Technical Blog

…These applications include AI assistants , customer support agents, coding co-pilots, and “deep research” assistants. Recent advances in algorithmic and model efficiency have reduced the cost of training and inference , as demonstrated…

Jun 18, 2025 · Vinh Nguyen

Followed topics