Accelerating GPT-OSS-20B on AMD Ryzen™ AI NPUs: Efficient MoE Inference on Strix and Halo
… Memory Allocation Strategy GPT-OSS-20B has a large memory footprint due to its 20B parameters and QMoE layers, even with INT4 quantization. To run the model on a variety of memory-constrained setups, a dynamic memory allocation scheme is used. …