Search

Showing top 23 results for "AI cost and memory"

People also ask

How Did Healthcare Platform Sully.ai Cut Inference Costs by 10x With Baseten, Open Source Models and Blackwell?

In healthcare, tedious, time-consuming tasks like medical coding, documentation and managing insurance forms cut into the time doctors can spend with patients. Sully.ai helps solve this problem by developing “AI employees” that can handle routine tasks like medical coding and note-taking. As the company’s platform scaled, its proprietary, closed source models created three bottlenecks: unpredictable latency in real-time clinical workflows, inference costs that scaled faster than revenue and insufficient control over model quality and updates. To overcome these bottlenecks, Sully.ai uses Basete

Leading Inference Providers Achieve Lowest Token Cost With Open Source Models on NVIDIA Blackwell

How Did Fireworks AI and Sentient Foundation Lower AI Costs for Agentic Chat by up to 50%?

Sentient Labs is focused on bringing AI developers together to build powerful reasoning AI systems that are all open source. The goal is to accelerate AI toward solving harder reasoning problems through research in secure autonomy, agentic architecture and continual learning. Its first app, Sentient Chat, orchestrates complex multi-agent workflows and integrates more than a dozen specialized AI agents from the community. Due to this, Sentient Chat has massive compute demands because a single user query could trigger a cascade of autonomous interactions that typically lead to costly infrastruct

Leading Inference Providers Achieve Lowest Token Cost With Open Source Models on NVIDIA Blackwell

How Did Together AI and Decagon Drive Down AI Costs for Customer Service by 6x?

Customer service calls with voice AI often end in frustration because even a slight delay can lead users to talk over the agent, hang up or lose trust. Decagon builds AI agents for enterprise customer support, with AI-powered voice being its most demanding channel. Decagon needed infrastructure that could deliver sub-second responses under unpredictable traffic loads with tokenomics that supported 24/7 voice deployments. Together AI runs production inference for Decagon’s multimodel voice stack on NVIDIA Blackwell GPUs. The companies collaborated on several key optimizations: speculative decod

Leading Inference Providers Achieve Lowest Token Cost With Open Source Models on NVIDIA Blackwell

What Are the Factors That Lower Token Cost?

Understanding how to optimize token cost requires looking at the equation for calculating cost per million tokens. In this equation, many enterprises evaluating AI infrastructure focus on the numerator: the cost per GPU per hour. For cloud deployments, this is the hourly rate paid to a cloud provider; for on-premises deployments, it’s the effective hourly cost derived from amortizing owned infrastructure. The real key to reducing token cost, however, lies in the denominator: maximizing the delivered token output. That denominator carries two business implications. Minimize token cost: When thi

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

… Accurately evaluating AI infrastructure starts with asking what lies beneath. Surface-level inquiry: What is the cost per GPU hour? What are the peak petaflops and high-bandwidth memory capacity? What are the FLOPS per dollar? In-depth cost analysis: What is the cost per million tokens ? …

Apr 15, 2026 · Shruti Koparkar

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

… Nemotron 3 Super has a 1‑million‑token context window, allowing agents to retain full workflow state in memory and preventing goal drift. …

Mar 11, 2026 · Kari Briski

Leading Inference Providers Achieve Lowest Token Cost With Open Source Models on NVIDIA Blackwell

… Moving to Blackwell’s native NVFP4 format, an ultralow precision floating-point data format reducing memory bandwidth and model size while maintaining inference accuracy, further cut that cost to just 5 cents — for a total 4x improvement in cost per token — while maintaining the accuracy that custo… …

Feb 12, 2026 · Shruti Koparkar

Embedded AI Archives

… Memory shortages have driven up costs across the industry. Jetson brings compute and memory together in a system-on-module, accelerating customer hardware design and making sourcing and validation easier than with discrete component approaches. …

May 7, 2026

6 sources covering this — show 5 more

NVIDIA Isaac GR00T Archives

May 7, 2026

Followed topics

People also ask

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

Leading Inference Providers Achieve Lowest Token Cost With Open Source Models on NVIDIA Blackwell

Embedded AI Archives

NVIDIA Isaac GR00T Archives