Qwen3-30B-A3B

vLLM MoE · 3B active / 30B total AgentsGeneralReasoning

Throughput

75.8t/s@ 40k

Engine

vLLM

Parameters

3B active / 30B total

Released

2025-07-08

Benchmarked

2026-06-27

Throughput at each benched context window (single measurement).

Context	KV	Throughput
@ 40k peak golden	—	75.8t/s

nvidia-qwen3-30b-a3b-eugr

generalmoefastvllmnvfp4tool-calling

Faster MoE sibling for interactive agent loops — lower latency and memory pressure when max intelligence isn't required.

golden ?/? @ 71.2 tok/s — fill~18432 — bench-agent-v2 — tool_ok=False