NVIDIA Holoscan
…STT and LLMs Go to GitHub Repo Run Instructions SAM2: Segment Anything in Images and Videos This application demonstrates how to run SAM2 models on a live video feed with the possibility…
SLMs are well-positioned for the agentic era because they use a narrow slice of LLM functionality for any single language model errand. LLMs are built to be powerful generalists, but most agents use only a very narrow subset of their capabilities. They typically parse commands, generate structured outputs such as JSON for tool calls, or produce summaries and answer contextualized questions. These tasks are repetitive (up to the differences in prompt payloads), predictable, and highly specialized—well within the scope of specialized SLMs. An LLM trained to handle open-domain conversations is o
How Small Language Models Are Key to Scalable Agentic AI | NVIDIA Technical Blog…STT and LLMs Go to GitHub Repo Run Instructions SAM2: Segment Anything in Images and Videos This application demonstrates how to run SAM2 models on a live video feed with the possibility…
…LLM‑as‑a‑judge setups to evaluate reasoning traces for correctness, completeness, and safety LLM‑as‑a‑judge to assess final conclusions and remediation plans Tool‑calling benchmarks such as BFCLv3 to…
…signal_optimizer signal_generator_llm: signal_generator code_generator_llm: code_generator optimization_advisor_llm: optimization_advisor ic_threshold: 0.02 p_value_threshold: 0.05 max_iterations: 3 num_signals: 2…
…Performance gains from NVSHMEM scale with sequence length and are most pronounced in multinode deployments and hybrid parallelism configurations, making NVSHMEM essential for production long-context LLM training using JAX and XLA…
…delivering large language model (LLM) performance that meets real-world latency and cost requirements. Running models with tens of billions of parameters in production, especially for conversational or voice-based AI agents…
…Enabled thread-safe execution for multiple GPUs with different compute capabilities, up to one network per thread. Performance improvements were made for LLMs and convolution-based models. Supports CUDA contexts created in…
…This capability extends CompileIQ’s applicability well beyond LLM inference. Anywhere NVIDIA compilers are used—scientific computing, autonomous vehicles, image processing, recommendation systems—CompileIQ can explore the optimization space and surface configurations…
…NVIDIA TensorRT LLM Cookbook : Fully optimized TensorRT LLM engines with latent MoE kernels for production-grade, low-latency deployment. Dynamo deployment recipes: Disaggregated serving, intelligent routing, multi-tier KV caching, and automatic…
…The capabilities baseline Evaluating a model focuses on the foundation model (an LLM , or VLM , for example) in isolation. It measures raw cognitive and linguistic potential using static datasets where the input…
…If capable and safe but gated on constant approvals, then you’re babysitting it. If capable and autonomous with full access, you’ve got a long-running process policing itself—guardrails living…