Search

Showing top 20 results for "Local voice assistant"

People also ask

How are NVIDIA and the OSS community accelerating inference for local agentic AI?

With agents running 24 hours a day, seven days a week on increasingly complex tasks, efficient local compute matters even more. NVIDIA has collaborated with the open source community to enhance the top inference backends for agents, llama.cpp and vLLM. llama.cpp now delivers 2x performance on Qwen 3.5 and 3.6 27B dense models, and 1.6x performance on Qwen 3.5 and 3.6 35B mixture-of-expert (MoE) models. The following two techniques make this possible: Multi-Token Prediction (MTP): An advanced speculative decoding technique, where a smaller draft model proposes several tokens ahead that the targ

Build Personal AI Agents on Windows PCs with New Tools from Microsoft and NVIDIA | NVIDIA Technical Blog

Developer Tools & Techniques – NVIDIA Technical Blog developer.nvidia.com

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.

Followed topics

Search

People also ask

MLOps – NVIDIA Technical Blog

Networking / Communications – NVIDIA Technical Blog

Content Creation / Rendering – NVIDIA Technical Blog

AR / VR – NVIDIA Technical Blog