मॉडल

Qwenimage-text-to-text164 benchmarks total

Qwen3.6-35B-A3B

Qwen / Qwen3.6-35B-A3B

36B

transformerssafetensorsqwen3_5_moeimage-text-to-text

MiniMax-M2.7

MiniMaxAI / MiniMax-M2.7

Minimaxtext-generation117 benchmarks total

Qwenimage-text-to-text60 benchmarks total

Qwen3.5-27B

Qwen / Qwen3.5-27B

28B

Qwenimage-text-to-text53 benchmarks total

Qwen3.5-9B-Base

Qwen / Qwen3.5-9B-Base

10B

Gemmaany-to-any34 benchmarks total

gemma-4-E4B

google / gemma-4-E4B

Qwenimage-text-to-text33 benchmarks total

Ornstein3.6-27B-MTP-NSC-ACE-SABER

GestaltLabs / Ornstein3.6-27B-MTP-NSC-ACE-SABER

27B

Qwenimage-text-to-text31 benchmarks total

Qwen3.5-35B-A3B-Base

Qwen / Qwen3.5-35B-A3B-Base

36B

transformerssafetensorsqwen3_5_moeimage-text-to-text

Gemmaimage-text-to-text28 benchmarks total

gemma-4-26B-A4B

google / gemma-4-26B-A4B

27B

gemma-4-31B

google / gemma-4-31B

Gemmaimage-text-to-text25 benchmarks total

Qwen2.5-14B

Qwen / Qwen2.5-14B

Qwentext-generation22 benchmarks total

safetensorsqwen2text-generationconversational

Qwenimage-text-to-text22 benchmarks total

Qwen3.5-122B-A10B

Qwen / Qwen3.5-122B-A10B

125B

transformerssafetensorsqwen3_5_moeimage-text-to-text

सर्वश्रेष्ठ27.3 tok/s

मीडियन26.1 tok/s

न्यूनतम3.2 tok/s

Qwen3-Coder-30B-A3B-Instruct

Qwen / Qwen3-Coder-30B-A3B-Instruct

Qwentext-generation22 benchmarks total

transformerssafetensorsqwen3_moetext-generation

LFM2.5-8B-A1B-Base

LiquidAI / LFM2.5-8B-A1B-Base

text-generation20 benchmarks total

transformerssafetensorslfm2_moetext-generation

Qwenimage-text-to-text16 benchmarks total

Qwen3.5-4B-Base

Qwen / Qwen3.5-4B-Base

Qwentext-generation16 benchmarks total

Qwen3-Coder-Next

Qwen / Qwen3-Coder-Next

80B

transformerssafetensorsqwen3_nexttext-generation

सर्वश्रेष्ठ80.8 tok/s

मीडियन55.8 tok/s

न्यूनतम51.2 tok/s

Llamatext-generation15 benchmarks total

Llama-3.1-8B

meta-llama / Llama-3.1-8B

transformerssafetensorsllamatext-generation

Gemmaany-to-any14 benchmarks total

gemma-4-12B

google / gemma-4-12B

12B

transformerssafetensorsgemma4_unifiedimage-text-to-text

सर्वश्रेष्ठ25.8 tok/s

मीडियन25.8 tok/s

न्यूनतम25.8 tok/s

Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16

nvidia / Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16

any-to-any13 benchmarks total

transformerssafetensorsNemotronH_Nano_Omni_Reasoning_V3image-feature-extraction

Qwen2.5-7B

Qwen / Qwen2.5-7B

Qwentext-generation12 benchmarks total

सर्वश्रेष्ठ1.4k tok/s

मीडियन1.4k tok/s

न्यूनतम1.4k tok/s

Gpttext-generation12 benchmarks total

gpt-oss-20b

openai / gpt-oss-20b

22B

transformerssafetensorsgpt_osstext-generation

gemma-4-E2B

google / gemma-4-E2B

Gemmaany-to-any11 benchmarks total

Gpttext-generation11 benchmarks total

gpt-oss-120b

openai / gpt-oss-120b

120B

transformerssafetensorsgpt_osstext-generation

gemma-3-12b-pt

google / gemma-3-12b-pt

12B

Gemmaimage-text-to-text10 benchmarks total

transformerssafetensorsgemma3image-text-to-text

Qwentext-generation10 benchmarks total

Qwen3.6-27B-DFlash

z-lab / Qwen3.6-27B-DFlash

transformerssafetensorsqwen3image-feature-extraction

Nex-N2-mini

nex-agi / Nex-N2-mini

35B

Qwentext-generation9 benchmarks total

transformerssafetensorsqwen3_5_moeimage-text-to-text

DeepSeek-V4-Flash

deepseek-ai / DeepSeek-V4-Flash

158B

Deepseektext-generation9 benchmarks total

transformerssafetensorsdeepseek_v4text-generation

सर्वश्रेष्ठ45.7 tok/s

मीडियन19.7 tok/s

न्यूनतम18.6 tok/s

GLM-4.7-Flash

zai-org / GLM-4.7-Flash

text-generation9 benchmarks total

transformerssafetensorsglm4_moe_litetext-generation

DeepSeek-R1-Distill-Qwen-7B

deepseek-ai / DeepSeek-R1-Distill-Qwen-7B

Deepseek-R1text-generation8 benchmarks total

text-generation8 benchmarks total

Ling-2.6-flash

inclusionAI / Ling-2.6-flash

107B

safetensorsbailing_hybridtext-generationconversational

सर्वश्रेष्ठ94.9 tok/s

मीडियन86.2 tok/s

न्यूनतम82.3 tok/s

Qwentext-generation8 benchmarks total

Qwen2.5-72B

Qwen / Qwen2.5-72B

73B

text-generation8 benchmarks total

Nemotron-Cascade-2-30B-A3B

nvidia / Nemotron-Cascade-2-30B-A3B

32B

Qwenimage-text-to-text7 benchmarks total

Qwen3.5-0.8B-Base

Qwen / Qwen3.5-0.8B-Base

सर्वश्रेष्ठ2.7k tok/s

मीडियन2.7k tok/s

न्यूनतम2.7k tok/s

Mistral7 benchmarks total

Mistral-Medium-3.5-128B

mistralai / Mistral-Medium-3.5-128B

128B

safetensorsmistral3vLLMen

Qwen3-30B-A3B-Base

Qwen / Qwen3-30B-A3B-Base

Qwentext-generation7 benchmarks total

transformerssafetensorsqwen3_moetext-generation

Deepseek-Codertext-generation6 benchmarks total

DeepSeek-Coder-V2-Lite-Instruct

deepseek-ai / DeepSeek-Coder-V2-Lite-Instruct

MoE16B

transformerssafetensorsdeepseek_v2text-generation

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

nvidia / NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

32B

text-generation5 benchmarks total

Qwen3-14B-Base

Qwen / Qwen3-14B-Base

Qwentext-generation5 benchmarks total

Mistral5 benchmarks total

Mistral-Small-3.1-24B-Base-2503

mistralai / Mistral-Small-3.1-24B-Base-2503

24B

vllmsafetensorsmistral3mistral-common

Gemmatext-generation5 benchmarks total

gemma-3-1b-pt

google / gemma-3-1b-pt

transformerssafetensorsgemma3_texttext-generation

Qwenimage-text-to-text5 benchmarks total

Qwen3.5-2B-Base

Qwen / Qwen3.5-2B-Base

Qwentext-generation5 benchmarks total

Qwen2.5-1.5B

Qwen / Qwen2.5-1.5B

Qwentext-generation5 benchmarks total

Qwen3-8B-Base

Qwen / Qwen3-8B-Base

Opttext-generation5 benchmarks total

NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

nvidia / NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

67B

LFM2.5-1.2B-Base

LiquidAI / LFM2.5-1.2B-Base

transformerssafetensorslfm2text-generation

granite-4.0-h-micro

ibm-granite / granite-4.0-h-micro

transformerssafetensorsgranitemoehybridtext-generation

सर्वश्रेष्ठ68.6 tok/s

मीडियन45.2 tok/s

न्यूनतम45.0 tok/s

LFM2-8B-A1B

LiquidAI / LFM2-8B-A1B

MoE1B active / 8B total

transformerssafetensorslfm2_moetext-generation

सर्वश्रेष्ठ18.3 tok/s

मीडियन18.3 tok/s

न्यूनतम9.9 tok/s

DeepSeek-R1-Distill-Qwen-14B

deepseek-ai / DeepSeek-R1-Distill-Qwen-14B

Deepseek-R1text-generation4 benchmarks total

transformerssafetensorsarxiv:2501.12948license:mit

सर्वश्रेष्ठ24.4 tok/s

मीडियन24.4 tok/s

न्यूनतम24.4 tok/s

Gemmaimage-text-to-text4 benchmarks total

gemma-3n-E4B

google / gemma-3n-E4B

transformerssafetensorsgemma3nimage-text-to-text

MiniMax-M2

MiniMaxAI / MiniMax-M2

Minimaxtext-generation4 benchmarks total

GLM-5.1

zai-org / GLM-5.1

754B

transformerssafetensorsglm_moe_dsatext-generation

granite-4.1-30b

ibm-granite / granite-4.1-30b

29B

transformerssafetensorsgranitetext-generation

सर्वश्रेष्ठ17.9 tok/s

मीडियन16.4 tok/s

न्यूनतम15.6 tok/s

Gemmatext-generation4 benchmarks total

Gemopus-4-26B-A4B-it

Jackrong / Gemopus-4-26B-A4B-it

27B

safetensorsgemma4gemmainstruction-tuned

सर्वश्रेष्ठ64.3 tok/s

मीडियन55.0 tok/s

न्यूनतम45.7 tok/s

phi-4

microsoft / phi-4

Phitext-generation3 benchmarks total

transformerssafetensorsphi3text-generation

सर्वश्रेष्ठ77.0 tok/s

मीडियन36.0 tok/s

न्यूनतम36.0 tok/s

text-generation3 benchmarks total

NVIDIA-Nemotron-3-Super-120B-A12B-BF16

nvidia / NVIDIA-Nemotron-3-Super-120B-A12B-BF16

124B

Qwen3-32B

Qwen / Qwen3-32B

Qwentext-generation3 benchmarks total

सर्वश्रेष्ठ79.3 tok/s

मीडियन22.9 tok/s

न्यूनतम22.8 tok/s

MiniMax-M2.5

MiniMaxAI / MiniMax-M2.5

Minimaxtext-generation3 benchmarks total

image-text-to-text2 benchmarks total

Step-3.7-Flash

stepfun-ai / Step-3.7-Flash

MoE201B

transformerssafetensorsstep3p7text-generation

text-generation2 benchmarks total

NVIDIA-Nemotron-Nano-12B-v2-Base

nvidia / NVIDIA-Nemotron-Nano-12B-v2-Base

12B

transformerssafetensorsnvidiapytorch

Qwentext-generation2 benchmarks total

Qwen3-4B-Base

Qwen / Qwen3-4B-Base

image-text-to-text2 benchmarks total

Kimi-K2.5

moonshotai / Kimi-K2.5

1.1T

transformerssafetensorskimi_k25image-feature-extraction

सर्वश्रेष्ठ74.0 tok/s

मीडियन74.0 tok/s

न्यूनतम74.0 tok/s

Llamatext-generation2 benchmarks total

Llama-3.2-3B-Instruct

meta-llama / Llama-3.2-3B-Instruct

transformerssafetensorsllamatext-generation

सर्वश्रेष्ठ79.9 tok/s

मीडियन65.4 tok/s

न्यूनतम50.9 tok/s

MiniMax-M2.1

MiniMaxAI / MiniMax-M2.1

Minimaxtext-generation2 benchmarks total

Qwen3-VL-30B-A3B-Instruct

Qwen / Qwen3-VL-30B-A3B-Instruct

Qwenimage-text-to-text2 benchmarks total

transformerssafetensorsqwen3_vl_moeimage-text-to-text

सर्वश्रेष्ठ56.6 tok/s

मीडियन52.2 tok/s

न्यूनतम47.7 tok/s

Mistral2 benchmarks total

Ministral-3-3B-Base-2512

mistralai / Ministral-3-3B-Base-2512

vllmsafetensorsmistral3mistral-common

Llamatext-generation2 benchmarks total

Llama-3.1-70B

meta-llama / Llama-3.1-70B

71B

transformerssafetensorsllamatext-generation

GLM-5.2

zai-org / GLM-5.2

753B

transformerssafetensorsglm_moe_dsatext-generation

Qwen2.5-3B

Qwen / Qwen2.5-3B

safetensorsqwen2text-generationconversational

ggufuncensoredabliteratedmxfp4

GPT-OSS-20B-Uncensored-HauhauCS-Aggressive

HauhauCS / GPT-OSS-20B-Uncensored-HauhauCS-Aggressive

20B

Gpt1 benchmarks total

सर्वश्रेष्ठ66.8 tok/s

मीडियन66.8 tok/s

न्यूनतम66.8 tok/s

Deepseek-R11 benchmarks total

UncensoredLM-DeepSeek-R1-Distill-Qwen-14B

uncensoredai / UncensoredLM-DeepSeek-R1-Distill-Qwen-14B

14B

safetensorsqwen2license:apache-2.0region:us

सर्वश्रेष्ठ27.0 tok/s

मीडियन27.0 tok/s

न्यूनतम27.0 tok/s

ggufqwen3_5llama.cppunsloth

Qwen3.5-9B-Red_Team

LuisPPB16 / Qwen3.5-9B-Red_Team

Qwen1 benchmarks total

सर्वश्रेष्ठ35.9 tok/s

मीडियन35.9 tok/s

न्यूनतम35.9 tok/s

Starcodertext-generation1 benchmarks total

rwkv-7-world

BlinkDL / rwkv-7-world

pytorchtext-generationcausal-lmrwkv

LFM2.5-350M-Base

LiquidAI / LFM2.5-350M-Base

transformerssafetensorslfm2text-generation

MiniCPM3-4B

openbmb / MiniCPM3-4B

transformerspytorchminicpm3text-generation

North-Mini-Code-1.0

CohereLabs / North-Mini-Code-1.0

30B

Coheretext-generation1 benchmarks total

transformerssafetensorscohere2_moetext-generation

Qwen3.5-0.8B-Q8_0.gguf

Manojb / Qwen3.5-0.8B-Q8_0.gguf

Qwen1 benchmarks total

ggufendpoints_compatibleregion:usconversational

MiniCPM5-1B-GGUF

openbmb / MiniCPM5-1B-GGUF

Llamatext-generation1 benchmarks total

transformersggufminicpmminicpm5

Qwen3.6-35B-A3B-4bit-DWQ

mlx-community / Qwen3.6-35B-A3B-4bit-DWQ

35B

mlxsafetensorsqwen3_5_moetext-generation

सर्वश्रेष्ठ78.9 tok/s

मीडियन78.9 tok/s

न्यूनतम78.9 tok/s

safetensorsmimo_v2multimodalvision-language

MiMo-V2.5

XiaomiMiMo / MiMo-V2.5

311B

1 benchmarks total

Qwen3.5-122B-A10B-GPTQ-Int4

Qwen / Qwen3.5-122B-A10B-GPTQ-Int4

125B

transformerssafetensorsqwen3_5_moeimage-text-to-text

सर्वश्रेष्ठ49.1 tok/s

मीडियन49.1 tok/s

न्यूनतम49.1 tok/s

Llamatext-generation1 benchmarks total

Llama-2-7b

meta-llama / Llama-2-7b

facebookmetapytorchllama

Qwen2.5-32B

Qwen / Qwen2.5-32B

safetensorsqwen2text-generationconversational

Qwen3-VL-8B-Instruct

Qwen / Qwen3-VL-8B-Instruct

transformerssafetensorsqwen3_vlimage-text-to-text

सर्वश्रेष्ठ95.9 tok/s

मीडियन95.9 tok/s

न्यूनतम95.9 tok/s

safetensorsqwen3prismmlbonsai

Ternary-Bonsai-8B-unpacked

prism-ml / Ternary-Bonsai-8B-unpacked

Qwen1 benchmarks total

Qwen3.5-35B-A3B-4bit

mlx-community / Qwen3.5-35B-A3B-4bit

transformerssafetensorsqwen3_5_moeimage-text-to-text

gemma-3-4b-pt

google / gemma-3-4b-pt

Gemmaimage-text-to-text1 benchmarks total

transformerssafetensorsgemma3image-text-to-text

GLM-5

zai-org / GLM-5

754B

transformerssafetensorsglm_moe_dsatext-generation

LFM2-24B-A2B

LiquidAI / LFM2-24B-A2B

24B

transformerssafetensorsliquidlfm2

DeepSeek-V4-Flash-2bit-DQ

mlx-community / DeepSeek-V4-Flash-2bit-DQ

284B

Deepseektext-generation1 benchmarks total

mlxsafetensorstext-generationen

सर्वश्रेष्ठ17.0 tok/s

मीडियन17.0 tok/s

न्यूनतम17.0 tok/s

Qwen3-VL-2B-Instruct

Qwen / Qwen3-VL-2B-Instruct

transformerssafetensorsqwen3_vlimage-text-to-text

सर्वश्रेष्ठ27.9 tok/s

मीडियन27.9 tok/s

न्यूनतम27.9 tok/s

Qwen3-30B-A3B-Instruct-2507

Qwen / Qwen3-30B-A3B-Instruct-2507

transformerssafetensorsqwen3_moetext-generation

Gemmatext-generation1 benchmarks total

Gemopus-4-26B-A4B-it-GGUF

Jackrong / Gemopus-4-26B-A4B-it-GGUF

26B

ggufgemma4gemmainstruction-tuned

सर्वश्रेष्ठ94.5 tok/s

मीडियन94.5 tok/s

न्यूनतम94.5 tok/s

transformerssafetensorsqwen2_vlimage-text-to-text

Qwen2-VL-7B

Qwen / Qwen2-VL-7B

Qwenimage-text-to-text

Qwen3.5-9B-NSC-ACE-SABER-GGUF

GestaltLabs / Qwen3.5-9B-NSC-ACE-SABER-GGUF

Qwen

ggufqwen3_5nsc-acesaber

ggufendpoints_compatibleregion:usconversational

LFM2-24B-A2B-GGUF

lmstudio-community / LFM2-24B-A2B-GGUF

24B