← Leaderboard
google/gemma-4-12b-it

Gemma-4-12B-IT

llama.cpp 12B GeneralMultimodal
Throughput
21.1t/s@ 32k
Engine
llama.cpp
Parameters
12B
Released
2026-05-23
Benchmarked
2026-06-27

Context ladder

Throughput at each benched context window.

Context KV Throughput
@ 32k peak 21.1t/s
@ 256k golden q8_0 14.1t/s

Golden profile

google-gemma-4-12b-it-llama

Capabilities

generalmultimodalaudiovisionagenticcodingvllmllamacppapache-2.0

Why we run it

Best mid-size Gemma 4 for a single Spark: encoder-free unified multimodal (text/image/audio), 256K context, near-26B-MoE quality at ~12B dense. Apache 2.0, not gated. Prefer -it over base.

Bench notes

golden 256k/q8_0 @ 14.1 tok/s — fill~50000 — bench-agent-v2 — tool_ok=True

Benchmarked 2026-06-27
SparkBench · GB10 · single node