google/gemma-4-26b-a4b-it
Gemma-4-26B-A4B-IT
Throughput
23.6t/s@ 8k
Engine
vLLM
Parameters
4B active / 26B total
Released
2026-03-11
Benchmarked
2026-06-27
Context ladder
Throughput at each benched context window.
| Context | KV | Throughput |
|---|---|---|
| @ 8k peak | — | 23.6t/s |
| @ 256k golden | fp8 | 22.2t/s |
Golden profile
google-gemma-4-26b-a4b-it-eugr
Capabilities
generalmoemultimodalvisionagenticcodingvllmapache-2.0
Why we run it
Popular MoE pick (~3.8B active / 26B total) — same efficiency class as Qwen3.6 MoE. 256K context, text+image. Heavier than 12B but fast inference per token.
Bench notes
golden 256k/fp8 @ 22.2 tok/s — fill~50000 — bench-agent-v2 — tool_ok=False