Search

Showing top 8 results for "AI cost and memory"

People also ask

What happened to Rubin CPX?

You may be scratching your head, wondering "wasn't there supposed to be some kind of special Rubin chip optimized for large-context prefill processing?" You're not hallucinating. Back at Computex last northern spring, Nvidia unveiled the Rubin CPX, a version of Rubin that used slower, less expensive GDDR7 memory to speed up the time to first token – how long users or agents have to wait for the model to start generating an output – when working with large inputs. The idea was that Rubin CPX could cut down on wait times for applications that might involve processing large quantities of document

A closer look at Nvidia's Groq-powered LPX rack systems

Storage vendors orbit the Nvidia sun at GTC

… Seagate and Supermicro say a smart AI stack separates short-term memory flash from long-term memory disk and uses each tier for what it does best: Real-time access tiers GPU HBM memory, CPU DRAM, local and network NVMe SSDs : handle the “right now” context — active tokens, hot embeddings, and frequ… …

Mar 18, 2026 · Chris Mellor

The agentic AI boom is here; operations will decide who wins

… At the same time, platform teams have to stand up shared AI infrastructure at scale while mantaining high performance, security, and cost controls. The Nutanix Agentic AI solution addresses this new reality with a full-stack cloud operating model for AI factories. …

Mar 18, 2026 · Tuhina Goel, director product marketing, AI at Nutanix

Guide to GPU virtualization: passthrough, vGPU, and MIG

… It is the right choice when a single workload must have the full card: Large model training runs, high-fidelity physics simulations, or rendering pipelines that saturate GPU memory. …

Apr 16, 2026 · VergeIO

A closer look at Nvidia's Groq-powered LPX rack systems

… Each LPU only has enough die space for 500 MB of on-chip memory. For comparison, just one of the eight HBM4 modules on Nvidia's Rubin GPUs contains 36GB of memory. …

Mar 19, 2026 · Tobias Mann

Unpacking the deceptively simple science of tokenomics

… But, as interactivity increases, Nvidia and AMD's smaller systems become more cost effective. Again, a lot of this depends on which software levers you pull. And it might explain why Nvidia burned $20 billion on Groq's intellectual property and talent. …

Mar 7, 2026 · Tobias Mann

Nvidia GTC 2026: What to expect at AI Burning Man

… By combining its GPU tech and CUDA software libraries with Groq's dataflow architecture, Nvidia has the opportunity to raise the Pareto curve dramatically, reducing the cost per token, while at the same time bolstering output speeds. …

Mar 13, 2026 · Tobias Mann

The AI divide putting open weights models in spotlight

… "I think there's a spectrum of solutions available, everything from fully private on-prem to sort of dedicated at the point of use in colocation datacenters, dedicated in the public cloud, to a shared environment for cost savings if your workload or prompts are not sensitive," Buss said. ® ai and m…

Apr 12, 2026 · Tobias Mann

Nvidia embraces optical scale-up as copper reaches limits

… Copper was the natural choice for this, Gilad Shainer, senior VP of networking at Nvidia, told El Reg . "Copper is the best connectivity, if you can use it," he said. "It's very cost effective, very cheap, and consumes zero power. …

Apr 5, 2026 · Tobias Mann

Followed topics

People also ask

Storage vendors orbit the Nvidia sun at GTC

The agentic AI boom is here; operations will decide who wins

Guide to GPU virtualization: passthrough, vGPU, and MIG

A closer look at Nvidia's Groq-powered LPX rack systems

Unpacking the deceptively simple science of tokenomics

Nvidia GTC 2026: What to expect at AI Burning Man

The AI divide putting open weights models in spotlight

Nvidia embraces optical scale-up as copper reaches limits