I replaced cloud LLMs with local models running off a Proxmox LXC, and the performance trade-off was worth it
…Sure, 4B, 7B, and even 9B models could handle simple inference requests, but anything requiring detailed troubleshooting or complex reasoning would be too much for them to handle – and in some cases…