Modellen

Beste255 tok/s

Mediaan42.4 tok/s

Min1.6 tok/s

Qwenimage-text-to-text208 benchmarks total

Qwen3.6-35B-A3B

Qwen / Qwen3.6-35B-A3B

MoE3B active / 36B total

transformerssafetensorsqwen3_5_moeimage-text-to-text

Beste391 tok/s

Mediaan71.3 tok/s

Min6.4 tok/s

Minimaxtext-generation117 benchmarks total

MiniMax-M2.7

MiniMaxAI / MiniMax-M2.7

MoE229B

Beste496 tok/s

Mediaan32.2 tok/s

Min0.5 tok/s

Gemmaimage-text-to-text86 benchmarks total

gemma-4-26B-A4B

google / gemma-4-26B-A4B

MoE4B active / 27B total

Qwenimage-text-to-text63 benchmarks total

Qwen3.5-9B-Base

Qwen / Qwen3.5-9B-Base

10B

Qwenimage-text-to-text63 benchmarks total

Qwen3.5-27B

Qwen / Qwen3.5-27B

28B

Beste287 tok/s

Mediaan30.9 tok/s

Min2.2 tok/s

Gemmaany-to-any41 benchmarks total

gemma-4-E4B

google / gemma-4-E4B

Beste70.5 tok/s

Mediaan70.5 tok/s

Min70.5 tok/s

Gemmaimage-text-to-text37 benchmarks total

gemma-4-31B

google / gemma-4-31B

33B

Qwenimage-text-to-text37 benchmarks total

Ornstein3.6-27B-MTP-NSC-ACE-SABER

GestaltLabs / Ornstein3.6-27B-MTP-NSC-ACE-SABER

27B

Qwenimage-text-to-text35 benchmarks total

Qwen3.5-35B-A3B-Base

Qwen / Qwen3.5-35B-A3B-Base

MoE3B active / 36B total

transformerssafetensorsqwen3_5_moeimage-text-to-text

Llamatext-generation31 benchmarks total

Meta-Llama-3-8B-Instruct

meta-llama / Meta-Llama-3-8B-Instruct

Beste148 tok/s

Mediaan48.4 tok/s

Min13.6 tok/s

Qwentext-generation31 benchmarks total

Qwen3-Coder-30B-A3B-Instruct

Qwen / Qwen3-Coder-30B-A3B-Instruct

MoE3B active / 31B total

transformerssafetensorsqwen3_moetext-generation

Beste101 tok/s

Mediaan84.2 tok/s

Min75.9 tok/s

Deepseektext-generation30 benchmarks total

DeepSeek-V4-Flash

deepseek-ai / DeepSeek-V4-Flash

MoE158B

transformerssafetensorsdeepseek_v4text-generation

Beste262 tok/s

Mediaan33.0 tok/s

Min18.6 tok/s

Gemmaany-to-any25 benchmarks total

gemma-4-12B

google / gemma-4-12B

12B

transformerssafetensorsgemma4_unifiedimage-text-to-text

Beste25.8 tok/s

Mediaan25.8 tok/s

Min25.8 tok/s

MoE10B active / 125B total

Qwen3.5-122B-A10B

Qwen / Qwen3.5-122B-A10B

Qwenimage-text-to-text25 benchmarks total

transformerssafetensorsqwen3_5_moeimage-text-to-text

Beste27.3 tok/s

Mediaan25.4 tok/s

Min3.2 tok/s

Gpttext-generation24 benchmarks total

gpt-oss-20b

openai / gpt-oss-20b

MoE22B

transformerssafetensorsgpt_osstext-generation

Beste991 tok/s

Mediaan80.3 tok/s

Min12.0 tok/s

Qwen2.5-14B

Qwen / Qwen2.5-14B

Qwentext-generation22 benchmarks total

safetensorsqwen2text-generationconversational

text-generation21 benchmarks total

LFM2.5-8B-A1B-Base

LiquidAI / LFM2.5-8B-A1B-Base

MoE1B active / 8B total

transformerssafetensorslfm2_moetext-generation

Qwenimage-text-to-text18 benchmarks total

Qwen3.5-4B-Base

Qwen / Qwen3.5-4B-Base

Qwentext-generation18 benchmarks total

Qwen3-Coder-Next

Qwen / Qwen3-Coder-Next

80B

transformerssafetensorsqwen3_nexttext-generation

Beste80.8 tok/s

Mediaan55.8 tok/s

Min48.2 tok/s

Gemmaany-to-any16 benchmarks total

gemma-4-E2B

google / gemma-4-E2B

Llamatext-generation15 benchmarks total

Llama-3.1-8B

meta-llama / Llama-3.1-8B

any-to-any13 benchmarks total

Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16

nvidia / Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16

MoE3B active / 33B total

transformerssafetensorsNemotronH_Nano_Omni_Reasoning_V3feature-extraction

Mistral13 benchmarks total

Mistral-Medium-3.5-128B

mistralai / Mistral-Medium-3.5-128B

128B

safetensorsmistral3vLLMen

Beste7.4 tok/s

Mediaan6.5 tok/s

Min6.2 tok/s

Gpttext-generation13 benchmarks total

gpt-oss-120b

openai / gpt-oss-120b

MoE120B

transformerssafetensorsgpt_osstext-generation

Beste223 tok/s

Mediaan70.0 tok/s

Min59.2 tok/s

Qwentext-generation12 benchmarks total

Qwen2.5-7B

Qwen / Qwen2.5-7B

Beste1.4k tok/s

Mediaan1.4k tok/s

Min1.4k tok/s

Qwentext-generation11 benchmarks total

Qwen3-30B-A3B-Base

Qwen / Qwen3-30B-A3B-Base

MoE3B active / 31B total

transformerssafetensorsqwen3_moetext-generation

text-generation11 benchmarks total

GLM-4.7-Flash

zai-org / GLM-4.7-Flash

MoE31B

transformerssafetensorsglm4_moe_litetext-generation

Beste212 tok/s

Mediaan176 tok/s

Min92.9 tok/s

text-generation11 benchmarks total

Nemotron-Cascade-2-30B-A3B

nvidia / Nemotron-Cascade-2-30B-A3B

MoE3B active / 32B total

Beste141 tok/s

Mediaan95.4 tok/s

Min89.8 tok/s

Gemmaimage-text-to-text10 benchmarks total

gemma-3-12b-pt

google / gemma-3-12b-pt

12B

transformerssafetensorsgemma3image-text-to-text

Qwentext-generation10 benchmarks total

Qwen3.6-27B-DFlash

z-lab / Qwen3.6-27B-DFlash

transformerssafetensorsqwen3feature-extraction

Beste215 tok/s

Mediaan39.2 tok/s

Min26.9 tok/s

text-generation9 benchmarks total

Ornith-1.0-35B-GGUF

deepreinforce-ai / Ornith-1.0-35B-GGUF

35B

transformersgguftext-generationlicense:mit

Beste253 tok/s

Mediaan64.4 tok/s

Min27.8 tok/s

Qwentext-generation9 benchmarks total

Nex-N2-mini

nex-agi / Nex-N2-mini

MoE35B

transformerssafetensorsqwen3_5_moeimage-text-to-text

Beste107 tok/s

Mediaan104 tok/s

Min40.5 tok/s

text-generation8 benchmarks total

GLM-5.2

zai-org / GLM-5.2

MoE753B

transformerssafetensorsglm_moe_dsatext-generation

Qwenimage-text-to-text8 benchmarks total

Qwen3.5-0.8B-Base

Qwen / Qwen3.5-0.8B-Base

Beste2.7k tok/s

Mediaan2.7k tok/s

Min2.7k tok/s

Deepseek-R1text-generation8 benchmarks total

DeepSeek-R1-Distill-Qwen-7B

deepseek-ai / DeepSeek-R1-Distill-Qwen-7B

Beste144 tok/s

Mediaan69.0 tok/s

Min38.1 tok/s

text-generation8 benchmarks total

Ling-2.6-flash

inclusionAI / Ling-2.6-flash

MoE107B

safetensorsbailing_hybridtext-generationconversational

Beste94.9 tok/s

Mediaan86.2 tok/s

Min82.3 tok/s

Qwentext-generation8 benchmarks total

Qwen2.5-72B

Qwen / Qwen2.5-72B

73B

text-generation7 benchmarks total

LFM2.5-1.2B-Base

LiquidAI / LFM2.5-1.2B-Base

transformerssafetensorslfm2text-generation

Qwentext-generation6 benchmarks total

Ornith-1.0-9B

deepreinforce-ai / Ornith-1.0-9B

Beste77.5 tok/s

Mediaan25.7 tok/s

Min24.7 tok/s

Qwentext-generation6 benchmarks total

Qwen3-8B-Base

Qwen / Qwen3-8B-Base

Deepseek-Codertext-generation6 benchmarks total

DeepSeek-Coder-V2-Lite-Instruct

deepseek-ai / DeepSeek-Coder-V2-Lite-Instruct

MoE16B

transformerssafetensorsdeepseek_v2text-generation

Beste150 tok/s

Mediaan87.0 tok/s

Min39.0 tok/s

Llamatext-generation5 benchmarks total

Llama-2-7b-hf

meta-llama / Llama-2-7b-hf

transformerspytorchsafetensorsllama

Beste202 tok/s

Mediaan50.4 tok/s

Min19.1 tok/s

Llamatext-generation5 benchmarks total

Llama-3.2-1B-Instruct

meta-llama / Llama-3.2-1B-Instruct

Beste448 tok/s

Mediaan195 tok/s

Min184 tok/s

text-generation5 benchmarks total

granite-4.0-h-micro

ibm-granite / granite-4.0-h-micro

MoE3B

transformerssafetensorsgranitemoehybridtext-generation

Beste141 tok/s

Mediaan45.3 tok/s

Min45.0 tok/s

text-generation5 benchmarks total

LFM2-8B-A1B

LiquidAI / LFM2-8B-A1B

MoE1B active / 8B total

transformerssafetensorslfm2_moetext-generation

Beste18.3 tok/s

Mediaan18.3 tok/s

Min9.9 tok/s

text-generation5 benchmarks total

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

nvidia / NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

MoE3B active / 32B total

Beste313 tok/s

Mediaan286 tok/s

Min117 tok/s

Qwen3-14B-Base

Qwen / Qwen3-14B-Base

Qwentext-generation5 benchmarks total

Mistral5 benchmarks total

Mistral-Small-3.1-24B-Base-2503

mistralai / Mistral-Small-3.1-24B-Base-2503

24B

vllmsafetensorsmistral3mistral-common

Gemmatext-generation5 benchmarks total

gemma-3-1b-pt

google / gemma-3-1b-pt

transformerssafetensorsgemma3_texttext-generation

Qwenimage-text-to-text5 benchmarks total

Qwen3.5-2B-Base

Qwen / Qwen3.5-2B-Base

Qwentext-generation5 benchmarks total

Qwen2.5-1.5B

Qwen / Qwen2.5-1.5B

MoE12B active / 67B total

NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

nvidia / NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

Opttext-generation5 benchmarks total

Beste262 tok/s

Mediaan175 tok/s

Min49.6 tok/s

DeepSeek-R1-Distill-Qwen-14B

deepseek-ai / DeepSeek-R1-Distill-Qwen-14B

Deepseek-R1text-generation4 benchmarks total

transformerssafetensorsarxiv:2501.12948license:mit

Beste24.4 tok/s

Mediaan24.4 tok/s

Min24.4 tok/s

Gemmaimage-text-to-text4 benchmarks total

gemma-3n-E4B

google / gemma-3n-E4B

transformerssafetensorsgemma3nimage-text-to-text

Minimaxtext-generation4 benchmarks total

MiniMax-M2

MiniMaxAI / MiniMax-M2

229B

Beste493 tok/s

Mediaan398 tok/s

Min303 tok/s

text-generation4 benchmarks total

GLM-5.1

zai-org / GLM-5.1

MoE754B

transformerssafetensorsglm_moe_dsatext-generation

Qwentext-generation4 benchmarks total

Qwen3-32B

Qwen / Qwen3-32B

33B

Beste79.3 tok/s

Mediaan22.8 tok/s

Min3.2 tok/s

text-generation4 benchmarks total

granite-4.1-30b

ibm-granite / granite-4.1-30b

29B

transformerssafetensorsgranitetext-generation

Beste17.9 tok/s

Mediaan16.4 tok/s

Min15.6 tok/s

Gemmatext-generation4 benchmarks total

Gemopus-4-26B-A4B-it

Jackrong / Gemopus-4-26B-A4B-it

MoE4B active / 27B total

safetensorsgemma4gemmainstruction-tuned

Beste64.3 tok/s

Mediaan55.0 tok/s

Min45.7 tok/s

Qwenimage-text-to-text3 benchmarks total

Qwen3-VL-8B-Instruct

Qwen / Qwen3-VL-8B-Instruct

transformerssafetensorsqwen3_vlimage-text-to-text

Beste95.9 tok/s

Mediaan95.9 tok/s

Min95.9 tok/s

Qwentext-generation3 benchmarks total

Ornith-1.0-35B

deepreinforce-ai / Ornith-1.0-35B

MoE0M

transformerssafetensorsqwen3_5_moeimage-text-to-text

Beste89.1 tok/s

Mediaan78.4 tok/s

Min67.8 tok/s

Gemmaimage-text-to-text3 benchmarks total

gemma-3-4b-pt

google / gemma-3-4b-pt

transformerssafetensorsgemma3image-text-to-text

image-text-to-text3 benchmarks total

Step-3.7-Flash

stepfun-ai / Step-3.7-Flash

MoE201B

transformerssafetensorsstep3p7text-generation

phi-4

microsoft / phi-4

Phitext-generation3 benchmarks total

transformerssafetensorsphi3text-generation

Beste77.0 tok/s

Mediaan36.0 tok/s

Min36.0 tok/s

Qwentext-generation3 benchmarks total

Qwen3-4B-Base

Qwen / Qwen3-4B-Base

MoE12B active / 124B total

NVIDIA-Nemotron-3-Super-120B-A12B-BF16

nvidia / NVIDIA-Nemotron-3-Super-120B-A12B-BF16

text-generation3 benchmarks total

Minimaxtext-generation3 benchmarks total

MiniMax-M2.5

MiniMaxAI / MiniMax-M2.5

MoE229B

Beste504 tok/s

Mediaan419 tok/s

Min334 tok/s

Deepseektext-generation2 benchmarks total

DeepSeek-V4-Flash-DSpark

deepseek-ai / DeepSeek-V4-Flash-DSpark

MoE165B

transformerssafetensorsdeepseek_v4text-generation

Beste262 tok/s

Mediaan262 tok/s

Min262 tok/s

Qwenimage-text-to-text2 benchmarks total

Qwen2.5-VL-7B-Instruct

Qwen / Qwen2.5-VL-7B-Instruct

transformerssafetensorsqwen2_5_vlimage-text-to-text

text-generation2 benchmarks total

Ornith-1.0-9B-GGUF

deepreinforce-ai / Ornith-1.0-9B-GGUF

transformersgguftext-generationlicense:mit

Beste32.0 tok/s

Mediaan31.9 tok/s

Min31.8 tok/s

Gemmaimage-text-to-text2 benchmarks total

diffusiongemma-26B-A4B-it

google / diffusiongemma-26B-A4B-it

MoE4B active / 26B total

transformerssafetensorsdiffusion_gemmaimage-text-to-text

Qwentext-generation2 benchmarks total

Qwen2.5-0.5B

Qwen / Qwen2.5-0.5B

text-generation2 benchmarks total

NVIDIA-Nemotron-Nano-12B-v2-Base

nvidia / NVIDIA-Nemotron-Nano-12B-v2-Base

12B

transformerssafetensorsnvidiapytorch

image-text-to-text2 benchmarks total

Kimi-K2.5

moonshotai / Kimi-K2.5

1.1T

transformerssafetensorskimi_k25feature-extraction

Beste74.0 tok/s

Mediaan74.0 tok/s

Min74.0 tok/s

Llamatext-generation2 benchmarks total

Llama-3.2-3B-Instruct

meta-llama / Llama-3.2-3B-Instruct

Beste79.9 tok/s

Mediaan65.4 tok/s

Min50.9 tok/s

Minimaxtext-generation2 benchmarks total

MiniMax-M2.1

MiniMaxAI / MiniMax-M2.1

229B

Beste499 tok/s

Mediaan416 tok/s

Min333 tok/s

Qwenimage-text-to-text2 benchmarks total

Qwen3.5-35B-A3B-4bit

mlx-community / Qwen3.5-35B-A3B-4bit

MoE3B active / 6B total

transformerssafetensorsqwen3_5_moeimage-text-to-text

Beste140 tok/s

Mediaan122 tok/s

Min105 tok/s

Qwenimage-text-to-text2 benchmarks total

Qwen3-VL-30B-A3B-Instruct

Qwen / Qwen3-VL-30B-A3B-Instruct

31B

transformerssafetensorsqwen3_vl_moeimage-text-to-text

Beste56.6 tok/s

Mediaan52.2 tok/s

Min47.7 tok/s

Mistral2 benchmarks total

Ministral-3-3B-Base-2512

mistralai / Ministral-3-3B-Base-2512

vllmsafetensorsmistral3mistral-common

Llamatext-generation2 benchmarks total

Llama-3.1-70B

meta-llama / Llama-3.1-70B

71B

Ornith-1.0-397B

deepreinforce-ai / Ornith-1.0-397B

MoE397B

transformerssafetensorsqwen3_5_moeimage-text-to-text

Minimaximage-text-to-text1 benchmarks total

MiniMax-M3

MiniMaxAI / MiniMax-M3

MoE427B

transformerssafetensorsminimax_m3_vlimage-text-to-text

image-text-to-text1 benchmarks total

Unlimited-OCR

baidu / Unlimited-OCR

MoE3B

transformerssafetensorsunlimited-ocrfeature-extraction

Beste365 tok/s

Mediaan365 tok/s

Min365 tok/s

Qwen3-1.7B-Base

Qwen / Qwen3-1.7B-Base

Gemmaimage-text-to-text1 benchmarks total

gemma-4-26B-A4B-it-QAT-MLX-4bit

lmstudio-community / gemma-4-26B-A4B-it-QAT-MLX-4bit

MoE4B active / 5B total

Beste65.3 tok/s

Mediaan65.3 tok/s

Min65.3 tok/s

transformerssafetensorslanguagegranite-4.1

granite-4.1-8b

ibm-granite / granite-4.1-8b

1 benchmarks total

Optimage-text-to-text1 benchmarks total

Step-3.7-Flash-NVFP4

stepfun-ai / Step-3.7-Flash-NVFP4

MoE104B

transformerssafetensorsstep3p7text-generation

Beste27.4 tok/s

Mediaan27.4 tok/s

Min27.4 tok/s

Qwable-3.6-35b

Mia-AiLab / Qwable-3.6-35b

35B

transformersggufqwenqwen3

Beste54.1 tok/s

Mediaan54.1 tok/s

Min54.1 tok/s

Qwen2.5-3B

Qwen / Qwen2.5-3B

safetensorsqwen2text-generationconversational

ggufuncensoredabliteratedmxfp4

GPT-OSS-20B-Uncensored-HauhauCS-Aggressive

HauhauCS / GPT-OSS-20B-Uncensored-HauhauCS-Aggressive

20B

Gpt1 benchmarks total

Beste66.8 tok/s

Mediaan66.8 tok/s

Min66.8 tok/s

Deepseek-R11 benchmarks total

UncensoredLM-DeepSeek-R1-Distill-Qwen-14B

uncensoredai / UncensoredLM-DeepSeek-R1-Distill-Qwen-14B

14B

safetensorsqwen2license:apache-2.0region:us

Beste27.0 tok/s

Mediaan27.0 tok/s

Min27.0 tok/s

Qwen3.5-9B-Red_Team

LuisPPB16 / Qwen3.5-9B-Red_Team

ggufqwen3_5llama.cppunsloth

Beste35.9 tok/s

Mediaan35.9 tok/s

Min35.9 tok/s

Starcodertext-generation1 benchmarks total

rwkv-7-world

BlinkDL / rwkv-7-world

pytorchtext-generationcausal-lmrwkv

LFM2.5-350M-Base

LiquidAI / LFM2.5-350M-Base

transformerssafetensorslfm2text-generation

MiniCPM3-4B

openbmb / MiniCPM3-4B

transformerspytorchminicpm3text-generation

Beste6.1 tok/s

Mediaan6.1 tok/s

Min6.1 tok/s

Coheretext-generation1 benchmarks total

North-Mini-Code-1.0

CohereLabs / North-Mini-Code-1.0

30B

transformerssafetensorscohere2_moetext-generation

Beste258 tok/s

Mediaan258 tok/s

Min258 tok/s

Qwen3.5-0.8B-Q8_0.gguf

Manojb / Qwen3.5-0.8B-Q8_0.gguf

ggufendpoints_compatibleregion:usconversational

Beste346 tok/s

Mediaan346 tok/s

Min346 tok/s

Llamatext-generation1 benchmarks total

MiniCPM5-1B-GGUF

openbmb / MiniCPM5-1B-GGUF

transformersggufminicpmminicpm5

Beste126 tok/s

Mediaan126 tok/s

Min126 tok/s

Qwen3.6-35B-A3B-4bit-DWQ

mlx-community / Qwen3.6-35B-A3B-4bit-DWQ

35B

mlxsafetensorsqwen3_5_moetext-generation

Beste78.9 tok/s

Mediaan78.9 tok/s

Min78.9 tok/s

safetensorsmimo_v2multimodalvision-language

MiMo-V2.5

XiaomiMiMo / MiMo-V2.5

311B

1 benchmarks total

Qwenimage-text-to-text1 benchmarks total

Qwen3.5-122B-A10B-GPTQ-Int4

Qwen / Qwen3.5-122B-A10B-GPTQ-Int4

125B

transformerssafetensorsqwen3_5_moeimage-text-to-text

Beste49.1 tok/s

Mediaan49.1 tok/s

Min49.1 tok/s

Llamatext-generation1 benchmarks total

Llama-2-7b

meta-llama / Llama-2-7b

facebookmetapytorchllama

Beste110 tok/s

Mediaan110 tok/s

Min110 tok/s

Qwen2.5-32B

Qwen / Qwen2.5-32B

33B

safetensorsqwen2text-generationconversational

Ternary-Bonsai-8B-unpacked

prism-ml / Ternary-Bonsai-8B-unpacked

safetensorsqwen3prismmlbonsai

GLM-5

zai-org / GLM-5

MoE754B

transformerssafetensorsglm_moe_dsatext-generation

LFM2-24B-A2B

LiquidAI / LFM2-24B-A2B

24B

transformerssafetensorsliquidlfm2

Beste161 tok/s

Mediaan161 tok/s

Min161 tok/s

Deepseektext-generation1 benchmarks total

DeepSeek-V4-Flash-2bit-DQ

mlx-community / DeepSeek-V4-Flash-2bit-DQ

284B

mlxsafetensorstext-generationen

Beste17.0 tok/s

Mediaan17.0 tok/s

Min17.0 tok/s

Qwenimage-text-to-text1 benchmarks total

Qwen3-VL-2B-Instruct

Qwen / Qwen3-VL-2B-Instruct

transformerssafetensorsqwen3_vlimage-text-to-text

Beste27.9 tok/s

Mediaan27.9 tok/s

Min27.9 tok/s

Qwen3-30B-A3B-Instruct-2507

Qwen / Qwen3-30B-A3B-Instruct-2507

31B

transformerssafetensorsqwen3_moetext-generation

Gemmatext-generation1 benchmarks total

Gemopus-4-26B-A4B-it-GGUF

Jackrong / Gemopus-4-26B-A4B-it-GGUF

26B

ggufgemma4gemmainstruction-tuned

Beste94.5 tok/s

Mediaan94.5 tok/s

Min94.5 tok/s

transformerssafetensorsqwen2_vlimage-text-to-text

Qwen2-VL-7B

Qwen / Qwen2-VL-7B

Qwenimage-text-to-text

Qwen3.5-9B-NSC-ACE-SABER-GGUF

GestaltLabs / Qwen3.5-9B-NSC-ACE-SABER-GGUF

Qwen

ggufqwen3_5nsc-acesaber

ggufendpoints_compatibleregion:usconversational

LFM2-24B-A2B-GGUF

lmstudio-community / LFM2-24B-A2B-GGUF

24B