nvidia/qwen3-30b-a3b
Qwen3-30B-A3B
Throughput
75.8t/s@ 40k
Engine
vLLM
Parameters
3B active / 30B total
Released
2025-07-08
Benchmarked
2026-06-27
Context ladder
Throughput at each benched context window (single measurement).
| Context | KV | Throughput |
|---|---|---|
| @ 40k peak golden | — | 75.8t/s |
Golden profile
nvidia-qwen3-30b-a3b-eugr
Capabilities
generalmoefastvllmnvfp4tool-calling
Why we run it
Faster MoE sibling for interactive agent loops — lower latency and memory pressure when max intelligence isn't required.
Bench notes
golden ?/? @ 71.2 tok/s — fill~18432 — bench-agent-v2 — tool_ok=False