Gemma-4-26B-A4B-IT

vLLM MoE · 4B active / 26B total GeneralMultimodalReasoning

Throughput

23.6t/s@ 8k

Engine

vLLM

Parameters

4B active / 26B total

Released

2026-03-11

Benchmarked

2026-06-27

Context ladder

Throughput at each benched context window.

Context	KV	Throughput
@ 8k peak	—	23.6t/s
@ 256k golden	fp8	22.2t/s

Golden profile

google-gemma-4-26b-a4b-it-eugr

Capabilities

generalmoemultimodalvisionagenticcodingvllmapache-2.0

Why we run it

Popular MoE pick (~3.8B active / 26B total) — same efficiency class as Qwen3.6 MoE. 256K context, text+image. Heavier than 12B but fast inference per token.

Bench notes