Nvidia slaps Groq into new LPX racks for faster AI response
… To put that in perspective, OpenAI currently charges about $15 per million output tokens for API access to its top GPT-5.4 model. …
One of the defining characteristics of SRAM-heavy architectures from Groq and its rival Cerebras is that they are very fast when running LLM inferencing workloads, routinely achieving generation rates exceeding 500 and even 1000 tokens a second. The faster Nvidia can generate tokens, the faster code assistants and AI agents can act. But this kind of speed also opens the door to what Huang describes as test-time scaling. The idea is that by letting "reasoning" models generate more "thinking" tokens, they can produce smarter, more accurate results. So, the faster you can generate tokens, the les
A closer look at Nvidia's Groq-powered LPX rack systemsIf you're not a hyperscaler, neocloud, model dev, LPX is probably not for you. The sheer number of LPUs required to serve large open models will likely put Nvidia's LPX platform out of reach for most enterprises. Speaking to press ahead of this week's keynote, Buck said Nvidia is focusing primarily on model builders and service providers that need to serve trillion-plus-parameter models with token rates exceeding 500 to 1,000 a second. Having said that, in a technical blog, Nvidia presented another use case for the LPUs as a speculative decode accelerator, something we suggested the company mi
A closer look at Nvidia's Groq-powered LPX rack systemsYou may be scratching your head, wondering "wasn't there supposed to be some kind of special Rubin chip optimized for large-context prefill processing?" You're not hallucinating. Back at Computex last northern spring, Nvidia unveiled the Rubin CPX, a version of Rubin that used slower, less expensive GDDR7 memory to speed up the time to first token – how long users or agents have to wait for the model to start generating an output – when working with large inputs. The idea was that Rubin CPX could cut down on wait times for applications that might involve processing large quantities of document
A closer look at Nvidia's Groq-powered LPX rack systems… To put that in perspective, OpenAI currently charges about $15 per million output tokens for API access to its top GPT-5.4 model. …
… Thomas Cornely, EVP of Product Management at Nutanix, said in a statement: "Nutanix Agentic AI extends our AHV hypervisor, Flow Virtual Networking, Nutanix Kubernetes Platform, and Nutanix Enterprise AI to deliver a cloud operating model to enterprise AI factories, enabling infrastructure and platf… …
… Tuesday's event was all about Arm’s newly announced AGI CPU products, which will free the company from the shackles of its IP licensing model by enabling the company to sell directly to end customers. Haas has high hopes for agentic AI to accelerate the British chip designer's datacenter business. …
… MORE CONTEXT Nvidia's on-again off-again H200 sales in China are now on again Chips... in spaaaace – courtesy of Nvidia Nvidia powers further into the CPU market with new rack systems packing 256 Vera processors Nvidia slaps $20B Groq tech into massive new LPX racks to speed AI response time Any ti… …
… It said its work would allow orgs to "scale AI initiatives" while "adhering to regional data sovereignty and compliance requirements." Dr Bastian Koller, Managing Director of the High Performance Computing Center at Stuttgart University and lead coordinator of HammerHAI said of the partnership: "Ha… …
… Each PE includes a pair of RISC-V vector cores. The chip is in production. MTIA 400: An evolution of the MTIA 300 that can support generative AI models and R&R workloads. …
… Since announcing the $500 billion Stargate initiative just over a year ago, OpenAI has tapped Oracle to deploy 4.5 gigawatts of compute capacity to fuel model development and services. …
… The Trump administration rescinded those rules before they took effect, with the Commerce Department saying they would have "stifled American innovation and saddled companies with burdensome new regulatory requirements." The department posted on X that it is "committed to promoting secure exports o… …
… In fact, this capability is how Cerebras won OpenAI's business earlier this year to power its Codex model . …
… Even AI generators like Sora 2 have some delay while you wait for them to process. "We combined 3D graphics, structured data, with generative AI, probabilistic computing," Huang said. …