stepfun-ai/step-3.7-flash
Step-3.7-Flash
Throughput
22.3t/s@ 32k
Engine
llama.cpp
Parameters
11B active / 198B total
Released
2026-05-23
Benchmarked
2026-06-27
Context ladder
Throughput at each benched context window (single measurement).
| Context | KV | Throughput |
|---|---|---|
| @ 32k peak golden | — | 22.3t/s |
Golden profile
stepfun-ai-step-3-7-flash-llama
Capabilities
generalmoemultimodalvisionagentictool-callingllamacppgguflong-context
Why we run it
StepFun frontier VLM MoE for Spark — IQ4_XS GGUF (~105 GB) + mmproj for llama.cpp. NVIDIA blog + StepFun benches validated on DGX Spark 128 GB.
Bench notes
golden ?/? @ 17.3 tok/s — fill~14745 — bench-agent-v2 — tool_ok=True