Unweight: how we compressed an LLM 22% without sacrificing quality
… The "gate" and "up" projections have different dimensions than the "down" projection, changing the order of operations performed within the matmul which requires different performance tradeoffs. …