$200 'socketed' Nvidia AI GPU for servers hacked into a PCIe card with custom PCB and 3D-printed cooling — modded Tesla V100 SMX data center GPU runs AI LLMs and is more efficient than many modern midrange offerings in AI inference
…Running Google 's gemma4: e4b, the V100 topped out at 108 tokens per second, while the 3060 12 GB only managed about 76 tokens per second, but it did so consuming less…
