google/gemma-4-12b-it
Gemma-4-12B-IT
Throughput
21.1t/s@ 32k
Engine
llama.cpp
Parameters
12B
Released
2026-05-23
Benchmarked
2026-06-27
Context ladder
Throughput at each benched context window.
| Context | KV | Throughput |
|---|---|---|
| @ 32k peak | — | 21.1t/s |
| @ 256k golden | q8_0 | 14.1t/s |
Golden profile
google-gemma-4-12b-it-llama
Capabilities
generalmultimodalaudiovisionagenticcodingvllmllamacppapache-2.0
Why we run it
Best mid-size Gemma 4 for a single Spark: encoder-free unified multimodal (text/image/audio), 256K context, near-26B-MoE quality at ~12B dense. Apache 2.0, not gated. Prefer -it over base.
Bench notes
golden 256k/q8_0 @ 14.1 tok/s — fill~50000 — bench-agent-v2 — tool_ok=True