Search

Showing top 53 results for "agent cost control"

People also ask

Why are SLMs beneficial to agentic AI tasks?

SLMs are well-positioned for the agentic era because they use a narrow slice of LLM functionality for any single language model errand. LLMs are built to be powerful generalists, but most agents use only a very narrow subset of their capabilities. They typically parse commands, generate structured outputs such as JSON for tool calls, or produce summaries and answer contextualized questions. These tasks are repetitive (up to the differences in prompt payloads), predictable, and highly specialized—well within the scope of specialized SLMs. An LLM trained to handle open-domain conversations is o

How Small Language Models Are Key to Scalable Agentic AI | NVIDIA Technical Blog

Why aren’t enterprises using SLMs more broadly?

If SLMs have clear advantages, why do most agents still rely so heavily on LLMs? We hypothesize that the barriers are perception-based or caused by organizational culture rather than technical limitations. Shifting to SLM-enabled architectures requires an intentional mindset change. SLM research uses generalist benchmarks, even though agentic workloads demand different evaluation metrics. Plus, LLMs often dominate the headlines. As the cost savings and reliability of SLM-enabled systems become undeniable, momentum will shift. The transition could mirror past shifts in computing, such as the mo

How Small Language Models Are Key to Scalable Agentic AI | NVIDIA Technical Blog

Full-Stack Optimizations for Agentic Inference with NVIDIA Dynamo | NVIDIA Technical Blog

… Please reach out to us if you have any ideas or feedback. { "model": "MiniMaxAI/MiniMax-M2.5", "messages": ... , "tools": ... , "nvext": { "agent hints": { "osl": 256, "speculative prefill": true, "priority": 10 }, "cache control": { "type": "ephemeral", "ttl": "1h" } } } The agent hints fields: pr… …

Apr 17, 2026 · Ishan Dhanani

Building for the Rising Complexity of Agentic Systems with Extreme Co-Design | NVIDIA Technical Blog

… The trace makes it clear that agent token consumption is shaped as much by agentic system behavior as by the nature of the tasks. The primary agent accumulates input context quickly when it is not delegating or compacting context, which causes cache-read input token costs to recur every turn. …

May 5, 2026 · Eduardo Alvarez

How Small Language Models Are Key to Scalable Agentic AI | NVIDIA Technical Blog

… Enterprises looking to control costs, improve efficiency, and scale responsibly can begin experimenting with heterogeneous systems today. Conclusion: the heterogeneous system advantage The demand for agentic AI systems is rapidly evolving. …

Aug 29, 2025 · Peter Belcak

NVIDIA Nemotron AI Models

… Nemotron 3 Nano 30B A3B Nemotron 3 Nano offers 4x faster throughput compared to Nemotron 2 Nano Leading accuracy for coding, reasoning, math and long context tasks Perfect for agents that need to deliver highest accuracy and efficiency for targeted tasks Nemotron 3 Nano Omni 30B A3B Single model fo… …

Reliable AI Coding for Unreal Engine: Improving Accuracy and Reducing Token Costs | NVIDIA Technical Blog

… From reliable retrieval to production-ready AI agents Once retrieval is stabilized, AI agents become more reliable because they operate on grounded context instead of improvisation. …

Mar 10, 2026 · Paul Logan

Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI | NVIDIA Technical Blog

… These systems rely on agentic long‑term memory for context that persists across turns, tools, and sessions so agents can build on prior reasoning instead of starting from scratch on every request. …

Mar 16, 2026 · Moshe Anschel

Run Autonomous, Self-Evolving Agents More Safely with NVIDIA OpenShell | NVIDIA Technical Blog

… It governs how the agent executes, what the agent can see and do, and where inference goes. OpenShell enables claws to run in isolated sandboxes, giving you fine-grained control over your privacy and security while letting you benefit from the agents’ productivity. …

Mar 16, 2026 · Ali Golshan

Followed topics

Search

People also ask

Full-Stack Optimizations for Agentic Inference with NVIDIA Dynamo | NVIDIA Technical Blog

Building for the Rising Complexity of Agentic Systems with Extreme Co-Design | NVIDIA Technical Blog

How Small Language Models Are Key to Scalable Agentic AI | NVIDIA Technical Blog

NVIDIA Nemotron AI Models

Top stories

Deploy Agentic-Ready AI at the Edge with Memory Efficiency in NVIDIA JetPack 7.2 | NVIDIA Technical Blog

Run Local AI Agents with Faster Models and Multi-Node Clustering on NVIDIA DGX Spark | NVIDIA Technical Blog

NVIDIA Vera CPU Sets a New Standard for Agentic Workloads in AI Factories | NVIDIA Technical Blog

Add a Specialized Deep Research Skill to Agent Harnesses | NVIDIA Technical Blog

Reliable AI Coding for Unreal Engine: Improving Accuracy and Reducing Token Costs | NVIDIA Technical Blog

Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI | NVIDIA Technical Blog

Run Autonomous, Self-Evolving Agents More Safely with NVIDIA OpenShell | NVIDIA Technical Blog

NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model | NVIDIA Technical Blog

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning | NVIDIA Technical Blog

Building Token‑Metered AI Services on Telco AI Factories | NVIDIA Technical Blog