Two old GPUs I salvaged are doing more AI work than a brand new $2000 card, and I won't be upgrading anytime soon
…You see, the router transfers the input to specific experts, and these independent neural networks specialize in specific workloads. So, I can add the --n-cpu-moe flag to my llama.cpp…
