Search

Showing top 112 results for "AI cost and performance"

All sources amd.com 37 blogs.nvidia.com 17 newsroom.intel.com 10 theregister.com 9 techpowerup.com 8 wccftech.com 5 intel.com 5 developer.nvidia.com 4 news.lenovo.com 4 nextplatform.com 3 engadget.com 2 tweaktown.com 2

People also ask

How Did NVIDIA Double Blackwell Performance Through Continuous Software Optimizations to Lower Token Cost?

NVIDIA doubled Blackwell performance through continuous software optimization, refining kernels, compiler paths, and inference runtimes so the same hardware delivers significantly more useful AI throughput over time. Initial gpt-oss-120b performance on an NVIDIA DGX Blackwell B200 system with the NVIDIA TensorRT LLM library was market-leading, but NVIDIA’s teams and the community have significantly optimized TensorRT LLM for open-source large language models. The TensorRT LLM v1.0 release is a major breakthrough in making large AI models faster and more responsive for everyone. Through advance

Telecommunications Archives

What Is InferenceMAX v1 and Why Does It Matter for AI Economics?

InferenceMAX v1, a new benchmark from SemiAnalysis released Monday, is the latest to highlight Blackwell’s inference leadership. It runs popular models across leading platforms, measures performance for a wide range of use cases and publishes results anyone can verify. Why do benchmarks like this matter? Because modern AI isn’t just about raw speed — it’s about efficiency and economics at scale. As models shift from one-shot replies to multistep reasoning and tool use, they generate far more tokens per query, dramatically increasing compute demands. NVIDIA’s open-source collaborations with Ope

Telecommunications Archives

How Does Blackwell Achieve 15x Lower Cost Per Token and 10x Higher Efficiency?

Metrics like tokens per watt, cost per million tokens and TPS/user matter as much as throughput. In fact, for power-limited AI factories, Blackwell delivers 10x throughput per megawatt for mixture-of-experts models compared with the previous generation, which translates into higher token revenue. The cost per token is crucial for evaluating AI model efficiency, directly impacting operational expenses. The NVIDIA Blackwell architecture lowered cost per million tokens by 15x versus the previous generation, leading to substantial savings and fostering wider AI deployment and innovation.

Telecommunications Archives

How Does Blackwell Balance Cost, Throughput, Efficiency and Responsiveness?

InferenceMAX uses the Pareto frontier — a curve that shows the best trade-offs between different factors, such as data center throughput and responsiveness — to map performance. But it’s more than a chart. It reflects how NVIDIA Blackwell balances the full spectrum of production priorities: cost, energy efficiency, throughput and responsiveness. That balance enables the highest ROI across real-world workloads. Systems that optimize for just one mode or scenario may show peak performance in isolation, but the economics of that doesn’t scale. Blackwell’s full-stack design delivers efficiency and

Telecommunications Archives

Videos

Enabling Agent Computers with AMD Ryzen™ AI Max

…They’re cost-efficient. They’re tireless. They’re always-on AI agents. AI Agent Computers run on AMD Ryzen™ AI Max processors, which are designed for high-performance laptops and compact…

May 11, 2026 · AMD News

Postcard from Embedded World: Meet Intel® Core™ Series 2 processor with P-cores

…Intel Launches Core Series 2 Processor with Real-Time Performance and Expands Edge AI Portfolio - Intel Newsroom Download pre-formatted images (29 MB) Related Posts Client Computing Intel at Computex 2026: Advancing…

Mar 12, 2026 · Daniela Morescalchi

Intel Core Ultra Series 3: The New Standard for Edge AI Robotics Compute

…By combining the CPU, the GPU and the always-on vision AI engine NPU onto one piece of silicon, Intel has reduced the heat and cost of the machine's 'brain.' This…

May 20, 2026 · Daniela Morescalchi

Intel and SambaNova Advance Agentic AI with Xeon 6

…host and action CPUs—addressing performance, efficiency, and software compatibility challenges facing enterprises and cloud providers. The heterogeneous design reflects a broader industry shift toward pairing each phase of AI inference with…

Apr 8, 2026 · Matt Hyatt

Fast, Low-Cost Inference Offers Key to Profitable AI

…Full-stack software optimization offers the key to improving AI inference performance and achieving this goal. Optimizing AI Inference for Cost-Effective User Throughput Businesses are often challenged with balancing the performance…

Jan 23, 2025 · Dave Salvator

Alibaba has made 470,000 AI chips, admits they’re inferior

…MORE CONTEXT Alibaba Cloud hikes prices by up to 34%, blames hardware costs and AI demand Alibaba Cloud can’t deploy servers fast enough to satisfy demand for AI Alibaba reveals 82…

Mar 20, 2026 · Simon Sharwood

Inference Performance for Data Center Deep Learning

…Metrics such as tokens per watt, cost per million tokens, and tokens per second per user are crucial alongside throughput. For power-limited AI factories, NVIDIA's continuous software improvements translate into…

Discussions and forums

r/GooglePixel · u/trust_me_im_human · 1w ago

What exactly makes Pixel worth the price now?

Google’s USP for the Pixel lineup has been AI for the last few years. But recently I got a chance to use my friend’s OnePlus 13 and honestly it made me question the value of Pixels. Almost every AI feature that Google ma…

r/LocalLLaMA · u/Scared-Biscotti2287 · 4d ago

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

Been following the infrastructure side of AI more lately and stumbled on this from Zai. They upgraded the network architecture on a thousand-GPU cluster running GLM-5.1 coding inference from the standard ROFT setup to so…

Hacker News · u/vbutsomesayw · 5d ago

Bill Gates AI on AI (one month later)

# The Agentic Tidal Wave*To:* Executive Staff and Direct Reports *From:* Bill Gates *Date:* April 26, 2026Our vision for the last 20 years can be summarized in a succinct way. We saw that exponential improvements in clou…

Hacker News · u/k-thimmaraju · 1w ago

Show HN: How to analyze your LLM output – A behavioural health monitor for LLMs

Hey HN! We're Dr. Kashyap Thimmaraju and Giuseppe Canale from Silicon Psyche. We've built Posture Sequence Analysis (PSA), a behavioural health monitor for LLMs and AI Agents.Why we built PSAWe built PSA because we wante…

9 5

r/sysadmin · u/Relaxation_Time · 4w ago

Followed topics

Search

People also ask

Videos

Enabling Agent Computers with AMD Ryzen™ AI Max

Postcard from Embedded World: Meet Intel® Core™ Series 2 processor with P-cores

Intel Core Ultra Series 3: The New Standard for Edge AI Robotics Compute

Intel and SambaNova Advance Agentic AI with Xeon 6

Top stories

ASUS Takes the Lead in Hybrid Agentic AI Infrastructure- Maximizing Performance While Reducing Inference Costs

Solving the Agentic AI Trilemma – Cost, Scale, and Data Security

The New AMD EPYC™ 8005 Server CPUs: Big Performance. Low Power. Small Footprint.

SPEC CPU 2026 and the Value of Open, Trusted Performance Measurement

Fast, Low-Cost Inference Offers Key to Profitable AI

Alibaba has made 470,000 AI chips, admits they’re inferior

Inference Performance for Data Center Deep Learning

Discussions and forums

What exactly makes Pixel worth the price now?

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

Bill Gates AI on AI (one month later)

Show HN: How to analyze your LLM output – A behavioural health monitor for LLMs

Reality check from the Microsoft AI Tour: "Agents" hype, the enterprise disconnect, and peak AI Fatigue

Smarter 5G Today, Seamless Path to 6G: Intel’s AI-Ready Network Vision

Lenovo Enables One-Week Deployment of Production-Ready Agentic AI to Transform Enterprise Workflows - Lenovo StoryHub

AMD Data Center Insights: EPYC CPUs, AI & Cloud Trends