yuxinlu1/mellum2-12b-opus-thinking
Mellum2 12B MoE Opus Thinking Q4
Throughput
74.4t/s@ 32k
Engine
llama.cpp
Parameters
12B
Benchmarked
2026-06-27
Context ladder
Throughput at each benched context window.
| Context | KV | Throughput |
|---|---|---|
| @ 32k peak golden | q8_0 | 74.4t/s |
| @ 64k | q8_0 | 27.1t/s |
| @ 96k | q8_0 | 19.3t/s |
| @ 128k | q8_0 | 14.6t/s |
Golden profile
mellum2-12b-opus-q4
Capabilities
testingcodingmoellamacppgoldengguf
Why we run it
Golden fleet target — auto-scaffolded from recipe mellum2-12b-opus-q4.
Bench notes
golden 32k/q8_0 @ 74.4 tok/s — fill~14745 — bench-agent-v2 — tool_ok=True
Links
- yuxinlu1/mellum2-12b-opus-thinking-gguf — gated or private repo on HuggingFace, no public link available.
- Golden recipe definition ↗