Step-3.7-Flash

llama.cpp MoE · 11B active / 198B total AgentsMultimodal

Throughput

22.3t/s@ 32k

Engine

llama.cpp

Parameters

11B active / 198B total

Released

2026-05-23

Benchmarked

2026-06-27

Throughput at each benched context window (single measurement).

Context	KV	Throughput
@ 32k peak golden	—	22.3t/s

stepfun-ai-step-3-7-flash-llama

generalmoemultimodalvisionagentictool-callingllamacppgguflong-context

StepFun frontier VLM MoE for Spark — IQ4_XS GGUF (~105 GB) + mmproj for llama.cpp. NVIDIA blog + StepFun benches validated on DGX Spark 128 GB.

golden ?/? @ 17.3 tok/s — fill~14745 — bench-agent-v2 — tool_ok=True