← Leaderboard
stepfun-ai/step-3.7-flash

Step-3.7-Flash

llama.cpp MoE · 11B active / 198B total AgentsMultimodal
Throughput
22.3t/s@ 32k
Engine
llama.cpp
Parameters
11B active / 198B total
Released
2026-05-23
Benchmarked
2026-06-27

Context ladder

Throughput at each benched context window (single measurement).

Context KV Throughput
@ 32k peak golden 22.3t/s

Golden profile

stepfun-ai-step-3-7-flash-llama

Capabilities

generalmoemultimodalvisionagentictool-callingllamacppgguflong-context

Why we run it

StepFun frontier VLM MoE for Spark — IQ4_XS GGUF (~105 GB) + mmproj for llama.cpp. NVIDIA blog + StepFun benches validated on DGX Spark 128 GB.

Bench notes

golden ?/? @ 17.3 tok/s — fill~14745 — bench-agent-v2 — tool_ok=True

Benchmarked 2026-06-27
SparkBench · GB10 · single node