I replaced cloud LLMs with local models running off a Proxmox LXC, and the performance trade-off was worth it
… On my aged system, I simply ran the ls -l /dev/nvidia command to get the device IDs 195, 235, and 237 for my GPU , pasted the following syntax into the LXC’s config file, and installed the graphics card drivers inside the LXC to configure GPU passthrough, before compiling llama.cpp’s Vulkan variant… …