Ollama is still the easiest way to start local LLMs, but it's the worst way to keep running them
… A Hard disk drive HDD read speed B CPU clock speed in GHz C Available VRAM on the GPU D Internet bandwidth Spot on! VRAM is the key bottleneck for local LLM inference. If a model fits entirely in your GPU's VRAM, it runs dramatically faster than when it falls back to system RAM or CPU processing. …