Search

Showing top 107 results for "AI cost and tokens"

People also ask

What Are the Factors That Lower Token Cost?

Understanding how to optimize token cost requires looking at the equation for calculating cost per million tokens. In this equation, many enterprises evaluating AI infrastructure focus on the numerator: the cost per GPU per hour. For cloud deployments, this is the hourly rate paid to a cloud provider; for on-premises deployments, it’s the effective hourly cost derived from amortizing owned infrastructure. The real key to reducing token cost, however, lies in the denominator: maximizing the delivered token output. That denominator carries two business implications. Minimize token cost: When thi

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters
How Does Blackwell Achieve 15x Lower Cost Per Token and 10x Higher Efficiency?

Metrics like tokens per watt, cost per million tokens and TPS/user matter as much as throughput. In fact, for power-limited AI factories, Blackwell delivers 10x throughput per megawatt for mixture-of-experts models compared with the previous generation, which translates into higher token revenue. The cost per token is crucial for evaluating AI model efficiency, directly impacting operational expenses. The NVIDIA Blackwell architecture lowered cost per million tokens by 15x versus the previous generation, leading to substantial savings and fostering wider AI deployment and innovation.

Telecommunications Archives
How Is AI Shifting from Pilots to AI Factories and What’s Next?

AI is moving from pilots to AI factories — infrastructure that manufactures intelligence by turning data into tokens and decisions in real time. Open, frequently updated benchmarks help teams make informed platform choices, tune for cost per token, latency service-level agreements and utilization across changing workloads. Learn more about how to calculate lowest cost per token and how the NVIDIA Think SMART framework drives cost efficient inference.

Telecommunications Archives
Why Does Cost per Token Matter Much More Than FLOPS per Dollar?

The following data for the DeepSeek-R1 AI model demonstrates the difference between theoretical and actual business outcomes. Looking at compute cost alone, the NVIDIA Blackwell platform appears to cost roughly 2x more than NVIDIA Hopper — but compute cost says nothing about the output that investment buys. An analysis of mere FLOPS per dollar suggests a 2x NVIDIA Blackwell advantage compared with the NVIDIA Hopper architecture. However, the actual outcome is orders of magnitude different: Blackwell delivers more than 50x greater token output per watt than Hopper, resulting in nearly 35x lower

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters
2 sources covering this — show 1 more
tomshardware.com › tech-industry › artificial-intelligence

OpenClaw creator burned through $1.3 million in OpenAI API tokens in a single month — bill covered 603 billion tokens across 7.6 million requests and 100 coding agents

… The bill covered 603 billion tokens across 7.6 million requests, all generated by roughly 100 Codex instances operated by a team of three people working on the open-source OpenClaw project. OpenAI, which employs Steinberger, covers the cost. …

May 17, 2026 · Luke James

Top stories

Discussions and forums

Hacker News · u/tinyopsstudio · 6d ago

Show HN: AI agent token cost calculator for Codex and Claude Code loops

Show HN: AI agent token cost calculator for Codex and Claude Code loops

2
Hacker News · u/Robelkidin · 3w ago

Show HN: Token Usage Meter 12 Providers and Coding Agent

Here once again A Token Usage Meter for 12+ AI Providers Anthropic, OpenAI, Google, Alibaba qween, Moonshot Kimi, MiniMax, ElevenLabs, Deepgram, Perplexity. Qlaud.ai provides token usage meter / AI billing layer. Also Ql…

2
r/LocalLLaMA · u/Scared-Biscotti2287 · 4d ago

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

Been following the infrastructure side of AI more lately and stumbled on this from Zai. They upgraded the network architecture on a thousand-GPU cluster running GLM-5.1 coding inference from the standard ROFT setup to so…

Hacker News · u/AdarshRao23 · 2w ago

Show HN: Torrix, self hosted, LLM Observability,(no Postgres, no Redis)

I work as a SAP Integration consultant and built this as a side project. Friction point: Most self hosted LLM observability tools require Postgres, Redis and non trivial infrastructure. Teams just want to see what their …

72 4
Hacker News · u/rem_cam · 3d ago

Hybrid local and cloud LLM stack for regulated financial document processing?

I'm scoping a hybrid AI pipeline for a consulting client in a regulated industry (GLBA-covered, NPI involved). Trying to validate the architecture before bringing on an engineer to build it.The workflow: ingest financial…

2 2