Gemma-4-12B-IT · 21.1 t/s @ 32k on Spark

Context ladder

Throughput at each benched context window.

Context	KV	Throughput
@ 32k peak	—	21.1t/s
@ 256k golden	q8_0	14.1t/s

Golden profile

google-gemma-4-12b-it-llama

Capabilities

generalmultimodalaudiovisionagenticcodingvllmllamacppapache-2.0

Why we run it

Best mid-size Gemma 4 for a single Spark: encoder-free unified multimodal (text/image/audio), 256K context, near-26B-MoE quality at ~12B dense. Apache 2.0, not gated. Prefer -it over base.

Bench notes

golden 256k/q8_0 @ 14.1 tok/s — fill~50000 — bench-agent-v2 — tool_ok=True

Links

google/gemma-4-12B-it on HuggingFace ↗ · released 2026-05-23
Golden recipe definition ↗