I stopped trying to replace my cloud LLMs, and local models finally made sense
… A lightweight Qwen model works great for this, since the latency is low. …
… A lightweight Qwen model works great for this, since the latency is low. …
… While CPU offloading works, the memory bandwidth bottleneck between the GPU and RAM makes the latency hard to ignore. …
… Related I replaced ChatGPT and Claude with this powerful local LLM and saved over $20 a month while gaining full control Qwen3.6 runs on my old GPU and does what ChatGPT does for free Building a llama.cpp cluster wasn’t that hard Starting with the SBCs that I wanted to use as the guinea pigs partic… …
… This was my first experience with Claude Code, and it knocked it out of the park Claude Code righted the wrongs of ChatGPT and helped me turn a script into a real app with a GUI The original script came out of ChatGPT in an attempt to see if what I was asking was even reasonable in the first place. …
… Since the models live on my hardware, there’s zero latency from internet round-trips, and I have total privacy. …
… Claude is made by Anthropic, not to be confused with OpenAI, which makes ChatGPT. …
… That's what would end up saving all your subscription costs for Claude and ChatGPT . With your own GPU, you won't be dependent on the cloud, or latency, and most importantly, all your data would remain on your own PC, not on a server halfway across the world. …
… Related I cancelled my ChatGPT, Perplexity, and Gemini subscriptions for Claude — and I should have sooner Wish I did this sooner. …