Open-source · DGX Spark · GB10

What actually runs on a DGX Spark.

SparkBench is a community lab for the NVIDIA GB10. We benchmark every model on real hardware, publish reproducible recipes for coding, agents, and reasoning, and ship the tooling so you can run it on your own box.

18
Models benched
75.8 t/s @ 40k
Peak throughput
3
Inference engines
1
GB10 box

Leaderboard

Sorted by throughput · context window shown per row · 18 models
Rankings are by throughput (tok/s) on a single GB10 — not by intelligence, coding quality, or SWE benchmarks. We can't wait to add those. Contributions welcome ↗
# Model For Engine Params Throughput
01 Qwen3-30B-A3B nvidia/qwen3-30b-a3b AgentsGeneralReasoning vLLM
75.8t/s@ 40k
02 Qwen3.6-35B-A3B (NVFP4) nvidia/qwen3.6-35b-a3b AgentsGeneralReasoning vLLM
73.5t/s@ 64k
03 Qwen3-Coder-30B-A3B-Instruct unsloth/qwen3-coder-30b-a3b-instruct AgentsCode llama.cpp
70.9t/s@ 32k
04 Diffusiongemma 26B A4B It google/diffusiongemma-26b-a4b-it MultimodalReasoning vLLM
70.8t/s@ 16k
05 qwen3-coder-next saricles/qwen3-coder-next AgentsCode vLLM
63.3t/s@ 256k
06 Qwen3.6-35B-A3B (GGUF) unsloth/qwen3.6-35b-a3b AgentsGeneralReasoning llama.cpp
48.6t/s@ 32k
07 qwen3-coder-next qwen/qwen3-coder-next AgentsCode vLLM
45.4t/s@ 256k
08 Qwen3.6-27B (FP8) qwen/qwen3.6-27b AgentsGeneralReasoning vLLM 27B
27.5t/s@ 256k
09 Gemma-4-26B-A4B-IT google/gemma-4-26b-a4b-it GeneralMultimodalReasoning vLLM
23.6t/s@ 8k
10 Step-3.7-Flash stepfun-ai/step-3.7-flash AgentsMultimodal llama.cpp 198B
22.3t/s@ 32k
11 Gemma-4-12B-IT google/gemma-4-12b-it GeneralMultimodal llama.cpp
21.1t/s@ 32k
12 gemma-4-12b-coder-fable5-composer2.5-v1 yuxinlu1/gemma-4-12b-coder-fable5-composer2.5-v1 Code llama.cpp
18.6t/s@ 32k
13 Qwen3.6-27B (PrismaQuant) rdtand/qwen3.6-27b GeneralReasoning vLLM 27B
10.3t/s@ 16k
14 Qwen3.6-27B (unsloth) unsloth/qwen3.6-27b AgentsGeneralReasoning vLLM 27B
9.4t/s@ 64k
15 Qwen3.6-27B (MoQ GGUF) kaitchup/qwen3.6-27b GeneralReasoning llama.cpp 27B
9.3t/s@ 32k
16 Phi-4 microsoft/phi-4 GeneralReasoning vLLM
8.0t/s@ 16k
17 Hermes-4-14B nousresearch/hermes-4-14b AgentsReasoning vLLM
8.0t/s@ 40k
18 DeepSeek-R1-Distill-Qwen-32B deepseek-ai/deepseek-r1-distill-qwen-32b Reasoning vLLM
3.6t/s@ 128k

Recipes by task

Pick by what you're building

Run it on your own Spark

Same tool that generates this leaderboard
SparkBench portal: Inference, Models, and Explore tabs
Portal — switch profiles, browse models, explore HuggingFace

SparkBench is the operator tool behind this site. Portal, model inventory, three inference engines (vLLM, llama.cpp, ds4), reproducible bench v2, and an OpenAI gateway — one CLI on your GB10.

  • One bootstrap command — clone, host env, portal, APIs, CLI.
  • Three engines — eugr, llama.cpp, ds4. One GPU at a time.
  • Real recipes — auto-scaffolded from weights; golden map in git.
  • Your data, your box — nothing leaves the LAN unless you ship it here.
install shell
# One command — core stack (no GPU engine yet)
curl -fsSL https://raw.githubusercontent.com/shawnmarck/sparkbench/main/scripts/bootstrap-sparkbench.sh | sudo bash

# Then pick an engine + gateway
sudo bash install/spark-install engine eugr
sudo bash install/spark-install gateway

# Switch, bench, serve
spark inference list
spark inference up qwen36-nvfp4
spark inference bench