Search: memory cost pressure

Analyzing Nvidia GB10's GPU

… FAHBench makes good use of local memory to keep data on-chip, reducing pressure on the global memory hierarchy. Global memory accesses that do happen have reasonable locality, even on the heavier NAV workload. …

Mar 14, 2026 · Chester Lam

Nvidia’s B200: Keeping the CUDA Juggernaut Rolling ft. Verda (formerly DataCrunch)

… FluidX3D can also use 16-bit floating point formats for storage, reducing memory capacity and bandwidth requirements. Computations still use FP32 and format conversion costs extra compute, so the FP16 formats result in a higher compute to bandwidth ratio. …

Dec 15, 2025 · Chester Lam

Embracing AI with Claude's C Compiler

… On Zen 5, that means retirement is blocked by an instruction that doesn’t read from memory. That indicates performance is limited by how fast the core can execute instructions, rather than how fast the memory subsystem can feed the core with data. …

Apr 1, 2026 · Chester Lam

Arm's Cortex X925: Reaching Desktop Performance

… Load/Store Memory accesses are among the most complicated and performance critical operations on a modern CPU. For each memory access, the load/store unit has to translate program-visible virtual addresses into physical addresses. …

Mar 3, 2026 · Chester Lam

Followed topics

Analyzing Nvidia GB10's GPU

Nvidia’s B200: Keeping the CUDA Juggernaut Rolling ft. Verda (formerly DataCrunch)

Embracing AI with Claude's C Compiler

Arm's Cortex X925: Reaching Desktop Performance