NVIDIA NeMo Agent Toolkit
…Tech Blog Improving AI Code Generation NVIDIA NeMo Agent Toolkit, USD, Cosmos Learn how to leverage AI code generation with the toolkit to build a test-driven coding agent. Documentation NeMo Agent…
SLMs are well-positioned for the agentic era because they use a narrow slice of LLM functionality for any single language model errand. LLMs are built to be powerful generalists, but most agents use only a very narrow subset of their capabilities. They typically parse commands, generate structured outputs such as JSON for tool calls, or produce summaries and answer contextualized questions. These tasks are repetitive (up to the differences in prompt payloads), predictable, and highly specialized—well within the scope of specialized SLMs. An LLM trained to handle open-domain conversations is o
How Small Language Models Are Key to Scalable Agentic AI | NVIDIA Technical BlogWith agents running 24 hours a day, seven days a week on increasingly complex tasks, efficient local compute matters even more. NVIDIA has collaborated with the open source community to enhance the top inference backends for agents, llama.cpp and vLLM. llama.cpp now delivers 2x performance on Qwen 3.5 and 3.6 27B dense models, and 1.6x performance on Qwen 3.5 and 3.6 35B mixture-of-expert (MoE) models. The following two techniques make this possible: Multi-Token Prediction (MTP): An advanced speculative decoding technique, where a smaller draft model proposes several tokens ahead that the targ
Build Personal AI Agents on Windows PCs with New Tools from Microsoft and NVIDIA | NVIDIA Technical Blog…Tech Blog Improving AI Code Generation NVIDIA NeMo Agent Toolkit, USD, Cosmos Learn how to leverage AI code generation with the toolkit to build a test-driven coding agent. Documentation NeMo Agent…
…An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning Mar 11, 2026 By Chris Alexiuk and Chintan Patel Discuss (0) Discuss (0) L T F R E Agentic AI systems need models…
Agentic AI / Generative AI Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints Feb 27, 2026 By Anu Srivastava Discuss (0) Discuss (0) L T F R E…
…Audio transcription NVIDIA empowered blueprint-generated visual agents with the ability to hear, leading to improved contextual understanding and unlocking information not captured by video. This feature greatly improves the accuracy of…
…Use the low-latency path where predictable token generation improves experience, such as coding assistants, agentic workflows with tight tool-calling loops, voice interactions, and real-time translation. Keep throughput-first workloads…
…DeepStream coding agent that enables the generation of complete video ingestion pipelines from natural language prompts Support for accelerating the development of generative AI applications, dramatically reducing development time from eight weeks…
…Support for an improved long-context window of up to 256K input tokens, allowing edge agents to ingest extensive environmental and historical data. By supporting Cosmos Reason 2, TensorRT Edge-LLM ensures…
…Around them, NeMo tools add retrieval, tool‑calling, evaluation, and judge models so agents can score their own work and improve. Efficiency is the hidden requirement that makes production viable. Real agents…
…If a signal has suboptimal performance in backtesting, the evaluation agent generates optimization suggestions that are fed back into the Signal Agent’s next iteration. This creates a self-improving loop where…
…NVIDIA DSX’s AI factory ecosystem to adopt the latest in agentic AI infrastructure software across the full stack, improving tokens per watt and lowering token cost, accelerating deployment, and strengthening operational…