Another Giant Leap: The Rubin CPX Specialized Accelerator & Rack
… This sparsity scheme is unlike the 2:4 structured sparsity used in Hopper and Ampere, and it isn’t like Blackwell’s 4:8 pairwise structured sparsity either. …
Many readers are doubtless salivating at the idea of spending less on HBM and are thinking: Why not curtail the amount of memory in a system even further? If a typical prefill sequence length means a memory utilization of low double digits or even single digits - why not reduce memory capacity to 1/10th the size? Does this mean doom for HBM demand and memory demand in general? However, things are not so simple in technology. What Rubin CPX does is reduce the cost of pre-fill and tokens. Lower cost of tokens increases demand, which means more demand for decode increases as well. Like many other
Another Giant Leap: The Rubin CPX Specialized Accelerator & Rack… This sparsity scheme is unlike the 2:4 structured sparsity used in Hopper and Ampere, and it isn’t like Blackwell’s 4:8 pairwise structured sparsity either. …
… Total Cost of Ownership for AI Clusters: We calculate the total cost of ownership for an AI Cluster, including capital costs such as the AI server, networking, storage, installation, and service, as well as operating costs such as colocation rental, power costs, remote hands and support engineers a… …