Open-source · DGX Spark · GB10

What actually runs on a DGX Spark.

SparkBench is a community lab for the NVIDIA GB10. We benchmark every model on real hardware, publish reproducible recipes for coding, agents, and reasoning, and ship the tooling so you can run it on your own box.

Browse the leaderboard Get the tool ↗

Models benched

75.8 t/s @ 40k

Peak throughput

Inference engines

GB10 box

Editor's pick Qwen3.6-27B (FP8) 27.5 t/s @ 256k

Leaderboard

Sorted by throughput · context window shown per row · 18 models

Rankings are by throughput (tok/s) on a single GB10 — not by intelligence, coding quality, or SWE benchmarks. We can't wait to add those. Contributions welcome ↗

#	Model	For	Engine	Params	Throughput
01	Qwen3-30B-A3B nvidia/qwen3-30b-a3b	AgentsGeneralReasoning	vLLM	—	75.8t/s@ 40k	HF ↗
02	Qwen3.6-35B-A3B (NVFP4) nvidia/qwen3.6-35b-a3b	AgentsGeneralReasoning	vLLM	—	73.5t/s@ 64k	HF ↗
03	Qwen3-Coder-30B-A3B-Instruct unsloth/qwen3-coder-30b-a3b-instruct	AgentsCode	llama.cpp	—	70.9t/s@ 32k	HF ↗
04	Diffusiongemma 26B A4B It google/diffusiongemma-26b-a4b-it	MultimodalReasoning	vLLM	—	70.8t/s@ 16k	HF ↗
05	qwen3-coder-next saricles/qwen3-coder-next	AgentsCode	vLLM	—	63.3t/s@ 256k	private
06	Qwen3.6-35B-A3B (GGUF) unsloth/qwen3.6-35b-a3b	AgentsGeneralReasoning	llama.cpp	—	48.6t/s@ 32k	HF ↗
07	qwen3-coder-next qwen/qwen3-coder-next	AgentsCode	vLLM	—	45.4t/s@ 256k	HF ↗
08	Qwen3.6-27B (FP8) qwen/qwen3.6-27b	AgentsGeneralReasoning	vLLM	27B	27.5t/s@ 256k	HF ↗
09	Gemma-4-26B-A4B-IT google/gemma-4-26b-a4b-it	GeneralMultimodalReasoning	vLLM	—	23.6t/s@ 8k	HF ↗
10	Step-3.7-Flash stepfun-ai/step-3.7-flash	AgentsMultimodal	llama.cpp	198B	22.3t/s@ 32k	HF ↗
11	Gemma-4-12B-IT google/gemma-4-12b-it	GeneralMultimodal	llama.cpp	—	21.1t/s@ 32k	HF ↗
12	gemma-4-12b-coder-fable5-composer2.5-v1 yuxinlu1/gemma-4-12b-coder-fable5-composer2.5-v1	Code	llama.cpp	—	18.6t/s@ 32k	HF ↗
13	Qwen3.6-27B (PrismaQuant) rdtand/qwen3.6-27b	GeneralReasoning	vLLM	27B	10.3t/s@ 16k	HF ↗
14	Qwen3.6-27B (unsloth) unsloth/qwen3.6-27b	AgentsGeneralReasoning	vLLM	27B	9.4t/s@ 64k	HF ↗
15	Qwen3.6-27B (MoQ GGUF) kaitchup/qwen3.6-27b	GeneralReasoning	llama.cpp	27B	9.3t/s@ 32k	HF ↗
16	Phi-4 microsoft/phi-4	GeneralReasoning	vLLM	—	8.0t/s@ 16k	HF ↗
17	Hermes-4-14B nousresearch/hermes-4-14b	AgentsReasoning	vLLM	—	8.0t/s@ 40k	HF ↗
18	DeepSeek-R1-Distill-Qwen-32B deepseek-ai/deepseek-r1-distill-qwen-32b	Reasoning	vLLM	—	3.6t/s@ 128k	HF ↗

Recipes by task

Pick by what you're building

General

Agents

Reasoning

Code

Multimodal

Run it on your own Spark

Same tool that generates this leaderboard

SparkBench portal: Inference, Models, and Explore tabs — Portal — switch profiles, browse models, explore HuggingFace

SparkBench is the operator tool behind this site. Portal, model inventory, three inference engines (vLLM, llama.cpp, ds4), reproducible bench v2, and an OpenAI gateway — one CLI on your GB10.

One bootstrap command — clone, host env, portal, APIs, CLI.
Three engines — eugr, llama.cpp, ds4. One GPU at a time.
Real recipes — auto-scaffolded from weights; golden map in git.
Your data, your box — nothing leaves the LAN unless you ship it here.

        install
        shell
      

# One command — core stack (no GPU engine yet)
curl -fsSL https://raw.githubusercontent.com/shawnmarck/sparkbench/main/scripts/bootstrap-sparkbench.sh | sudo bash

# Then pick an engine + gateway
sudo bash install/spark-install engine eugr
sudo bash install/spark-install gateway

# Switch, bench, serve
spark inference list
spark inference up qwen36-nvfp4
spark inference bench