Search

Showing top 14 results for "GPU memory bandwidth"

People also ask

How many LPUs do you need?

The whole system requires a lot of LPUs. The exact ratio of GPUs to LPUs depends on the workload. Tasks requiring extremely large contexts, batch sizes, or concurrency may need a larger pool of GPUs. A general-purpose chatbot might run well on a single rack. This is because longer context windows require more memory for the key-value (KV) caches that store model state (think short-term memory) and attention operations. By keeping these on the GPU, Nvidia is able to get by with fewer LPUs. The actual number of required LPUs is directly proportional to the size of a model. For a trillion-paramet

A closer look at Nvidia's Groq-powered LPX rack systems