Google's Gemma 4 AI models get 3x speed boost by predicting future tokens
…As a result, the processor spends a lot of time moving parameters from VRAM to compute units for each token, and compute cycles are going unused during this process. MTP uses that…