Search

Showing top 107 results for "AI cost and tokens"

All sources blogs.nvidia.com 24 developer.nvidia.com 12 theregister.com 10 huggingface.co 10 wccftech.com 6 techcrunch.com 5 amd.com 4 xda-developers.com 4 tomshardware.com 3 pcworld.com 3 pcgamer.com 3 theverge.com 2

People also ask

What Are the Factors That Lower Token Cost?

Understanding how to optimize token cost requires looking at the equation for calculating cost per million tokens. In this equation, many enterprises evaluating AI infrastructure focus on the numerator: the cost per GPU per hour. For cloud deployments, this is the hourly rate paid to a cloud provider; for on-premises deployments, it’s the effective hourly cost derived from amortizing owned infrastructure. The real key to reducing token cost, however, lies in the denominator: maximizing the delivered token output. That denominator carries two business implications. Minimize token cost: When thi

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

How Does Blackwell Achieve 15x Lower Cost Per Token and 10x Higher Efficiency?

Metrics like tokens per watt, cost per million tokens and TPS/user matter as much as throughput. In fact, for power-limited AI factories, Blackwell delivers 10x throughput per megawatt for mixture-of-experts models compared with the previous generation, which translates into higher token revenue. The cost per token is crucial for evaluating AI model efficiency, directly impacting operational expenses. The NVIDIA Blackwell architecture lowered cost per million tokens by 15x versus the previous generation, leading to substantial savings and fostering wider AI deployment and innovation.

Telecommunications Archives

How Is AI Shifting from Pilots to AI Factories and What’s Next?

AI is moving from pilots to AI factories — infrastructure that manufactures intelligence by turning data into tokens and decisions in real time. Open, frequently updated benchmarks help teams make informed platform choices, tune for cost per token, latency service-level agreements and utilization across changing workloads. Learn more about how to calculate lowest cost per token and how the NVIDIA Think SMART framework drives cost efficient inference.

Telecommunications Archives

Why Does Cost per Token Matter Much More Than FLOPS per Dollar?

The following data for the DeepSeek-R1 AI model demonstrates the difference between theoretical and actual business outcomes. Looking at compute cost alone, the NVIDIA Blackwell platform appears to cost roughly 2x more than NVIDIA Hopper — but compute cost says nothing about the output that investment buys. An analysis of mere FLOPS per dollar suggests a 2x NVIDIA Blackwell advantage compared with the NVIDIA Hopper architecture. However, the actual outcome is orders of magnitude different: Blackwell delivers more than 50x greater token output per watt than Hopper, resulting in nearly 35x lower

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

Videos

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

… In-depth cost analysis: What is the cost per million tokens ? Specifically, what is the cost per million tokens for large-scale mixture-of-experts MoE reasoning models, which represent the most widely deployed type of AI models? …

Apr 15, 2026 · Shruti Koparkar

2 sources covering this — show 1 more

NVIDIA Wants Everyone To Rethink AI TCO, & Explains Why "Cost Per Token" Is The Only Metric That Matters wccftech.com

Building Token‑Metered AI Services on Telco AI Factories | NVIDIA Technical Blog

… NVIDIA GB200 NVL72 delivers order‑of‑magnitude improvements in tokens‑per‑second and cost‑per‑million‑tokens versus the previous generation, and leading inference providers report up to 10x lower cost‑per‑token on real workloads when they pair Blackwell with optimized stacks. …

May 21, 2026 · Waleed Badr

OpenClaw creator burned through $1.3 million in OpenAI API tokens in a single month — bill covered 603 billion tokens across 7.6 million requests and 100 coding agents

… The bill covered 603 billion tokens across 7.6 million requests, all generated by roughly 100 Codex instances operated by a team of three people working on the open-source OpenClaw project. OpenAI, which employs Steinberger, covers the cost. …

May 17, 2026 · Luke James

Telecommunications Archives

… The cost per token is crucial for evaluating AI model efficiency, directly impacting operational expenses. The NVIDIA Blackwell architecture lowered cost per million tokens by 15x versus the previous generation, leading to substantial savings and fostering wider AI deployment and innovation. …

May 7, 2026

AI quota inflation is no token effort. It's baked in

… All that can be said about the evolutionary driver that will move things on is that it has yet to be invented, despite fifty years of looking. The AI industry builds out in gigawatts and charges in tokens. It sets the cost and scents a future where profound lock-in lets it set the rules forever. …

Apr 20, 2026 · Rupert Goodwins

Paper page - Agentic AI Systems Should Be Designed as Marginal Token Allocators

Papers arxiv:2605.01214 Agentic AI Systems Should Be Designed as Marginal Token Allocators Published on May 2 Submitted by siqi zhu on May 5 University of Illinois at Urbana-Champaign Authors: Siqi Zhu Abstract Agentic AI systems should be evaluated as marginal token allocation economies rather tha… …

May 5, 2026

Discussions and forums

Hacker News · u/tinyopsstudio · 6d ago

Show HN: AI agent token cost calculator for Codex and Claude Code loops

Hacker News · u/Robelkidin · 3w ago

Show HN: Token Usage Meter 12 Providers and Coding Agent

Here once again A Token Usage Meter for 12+ AI Providers Anthropic, OpenAI, Google, Alibaba qween, Moonshot Kimi, MiniMax, ElevenLabs, Deepgram, Perplexity. Qlaud.ai provides token usage meter / AI billing layer. Also Ql…

r/LocalLLaMA · u/Scared-Biscotti2287 · 4d ago

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

Been following the infrastructure side of AI more lately and stumbled on this from Zai. They upgraded the network architecture on a thousand-GPU cluster running GLM-5.1 coding inference from the standard ROFT setup to so…

Hacker News · u/AdarshRao23 · 2w ago

Show HN: Torrix, self hosted, LLM Observability,(no Postgres, no Redis)

I work as a SAP Integration consultant and built this as a side project. Friction point: Most self hosted LLM observability tools require Postgres, Redis and non trivial infrastructure. Teams just want to see what their …

72 4

Hacker News · u/rem_cam · 3d ago

Hybrid local and cloud LLM stack for regulated financial document processing?

I'm scoping a hybrid AI pipeline for a consulting client in a regulated industry (GLBA-covered, NPI involved). Trying to validate the architecture before bringing on an engineer to build it.The workflow: ingest financial…

2 2

Groq's Inference Chips Are Beating NVIDIA's Blackwell by 5x on Cost - And Doing It Twice as Fast

… Cost-Per-Million Tokens: NVIDIA Blackwell vs. Groq Breakdown The alternate cost structure is now seeing firms charge their users by the token, or by the million tokens. As per the details, Groq's chips are significantly budget-friendly as they cost between five and 10 cents per million tokens. …

Apr 23, 2026 · Ramish Zafar

You’re about to feel the AI money squeeze

… That may look like “thinking through” a lot of different potential paths, launching sub-agents to do portions of a task, or verifying the accuracy of different steps of the process. “You put in your one-sentence prompt… and it’ll talk out loud to itself for thousands and thousands of tokens, thousa… …

Apr 23, 2026 · Hayden Field

The Many Aspects of Inference Performance

… To illustrate the impact of software optimization on cost per token : since February, MI355X GPU cost per token has dropped significantly, while GB300 NVL72 remains higher and unchanged Figure 2 . Figure 2: Cost per million tokens over time, at interactivity 100 TPS/user -- DeepSeek R1, FP8, no MTP. …

May 11, 2026 · AMD AI Group

Followed topics

Search

People also ask

Videos

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

Building Token‑Metered AI Services on Telco AI Factories | NVIDIA Technical Blog

OpenClaw creator burned through $1.3 million in OpenAI API tokens in a single month — bill covered 603 billion tokens across 7.6 million requests and 100 coding agents

Top stories

ASUS Takes the Lead in Hybrid Agentic AI Infrastructure- Maximizing Performance While Reducing Inference Costs

AI cost crisis hits tech giants as employee 'tokenmaxxing' backfires, sparking corporate pullback at Microsoft, Meta, and Amazon — agentic AI eats up to 1000x more tokens than standard AI

Solving the Agentic AI Trilemma – Cost, Scale, and Data Security

Dell Launches Local ‘Deskside Agentic AI’ Workstations to Slash Cloud Token Costs

Telecommunications Archives

AI quota inflation is no token effort. It's baked in

Paper page - Agentic AI Systems Should Be Designed as Marginal Token Allocators

Discussions and forums

Show HN: AI agent token cost calculator for Codex and Claude Code loops

Show HN: Token Usage Meter 12 Providers and Coding Agent

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

Show HN: Torrix, self hosted, LLM Observability,(no Postgres, no Redis)

Hybrid local and cloud LLM stack for regulated financial document processing?

Groq's Inference Chips Are Beating NVIDIA's Blackwell by 5x on Cost - And Doing It Twice as Fast

You’re about to feel the AI money squeeze

The Many Aspects of Inference Performance