My RTX 5090 can't keep up with Apple Silicon on the biggest local LLMs, and I hate to admit it
… Unified memory wins on cost-per-GB at the top, not in the middle. For the 400GB-plus class, the Nvidia alternative is not a normal stack of consumer cards, but a multi-accelerator server with enough A100/H100/H200-class memory to keep the model resident. …