Skip to main content
The Keyword
Accelerating Gemma 4: faster inference with multi-token prediction drafters
["How can teachers and students use AI?", "What are the newest features in Chrome?", "How can I learn new AI skills?"]

Accelerating Gemma 4: faster inference with multi-token prediction drafters

Gemma 4 MTP Drafter
Listen to article
This content is generated by Google AI. Generative AI is experimental
[[duration]] minutes

Tokens-per-second speed increases, tested on hardware using LiteRT-LM, MLX, Hugging Face Transformers, and vLLM.

Gemma 4 (MTP) drafter speed ups

Gemma 4 26B on a NVIDIA RTX PRO 6000. Standard Inference (left) vs. MTP Drafter (right) in tokens per second. Same output quality, half the wait time.

Let’s stay in touch. Get the latest news from Google in your inbox.

Subscribe