ModelsLeaderboardEvalsTrainRentalsAPI Docs

Models

58 model groups · 195 total

Qwen3.6-27B

Qwen / Qwen3.6-27B

28B
Qwenimage-text-to-text180 benchmarks total
transformerssafetensorsqwen3_5image-text-to-text
Best140 tok/s
Median40.9 tok/s
Min2.3 tok/s
View benchmarks →

Qwen3.6-35B-A3B

Qwen / Qwen3.6-35B-A3B

36B
Qwenimage-text-to-text98 benchmarks total
transformerssafetensorsqwen3_5_moeimage-text-to-text
Best190 tok/s
Median69.5 tok/s
Min6.4 tok/s
View benchmarks →

Qwen3.5-27B

Qwen / Qwen3.5-27B

28B
Qwenimage-text-to-text55 benchmarks total
transformerssafetensorsqwen3_5image-text-to-text
Best250 tok/s
Median12.6 tok/s
Min2.2 tok/s
View benchmarks →

Qwen3.5-9B-Base

Qwen / Qwen3.5-9B-Base

10B
Qwenimage-text-to-text35 benchmarks total
transformerssafetensorsqwen3_5image-text-to-text
View benchmarks →

MiniMax-M2.7

MiniMaxAI / MiniMax-M2.7

229B
Minimaxtext-generation31 benchmarks total
transformerssafetensorsminimax_m2text-generation
Best496 tok/s
Median20.0 tok/s
Min0.5 tok/s
View benchmarks →

Qwen3.5-35B-A3B-Base

Qwen / Qwen3.5-35B-A3B-Base

36B
Qwenimage-text-to-text31 benchmarks total
transformerssafetensorsqwen3_5_moeimage-text-to-text
View benchmarks →

gemma-4-26B-A4B

google / gemma-4-26B-A4B

27B
Gemmaimage-text-to-text23 benchmarks total
transformerssafetensorsgemma4image-text-to-text
View benchmarks →

Qwen3.5-122B-A10B

Qwen / Qwen3.5-122B-A10B

125B
Qwenimage-text-to-text20 benchmarks total
transformerssafetensorsqwen3_5_moeimage-text-to-text
Best27.3 tok/s
Median26.8 tok/s
Min3.2 tok/s
View benchmarks →

gemma-4-31B

google / gemma-4-31B

33B
Gemmaimage-text-to-text15 benchmarks total
transformerssafetensorsgemma4image-text-to-text
View benchmarks →

Qwen3-Coder-30B-A3B-Instruct

Qwen / Qwen3-Coder-30B-A3B-Instruct

31B
Qwentext-generation15 benchmarks total
transformerssafetensorsqwen3_moetext-generation
Best100 tok/s
Median80.5 tok/s
Min79.8 tok/s
View benchmarks →

Qwen3-Coder-Next

Qwen / Qwen3-Coder-Next

80B
Qwentext-generation13 benchmarks total
transformerssafetensorsqwen3_nexttext-generation
Best80.8 tok/s
Median55.7 tok/s
Min51.2 tok/s
View benchmarks →

Qwen3.5-4B-Base

Qwen / Qwen3.5-4B-Base

5B
Qwenimage-text-to-text12 benchmarks total
transformerssafetensorsqwen3_5image-text-to-text
View benchmarks →

Qwen3.6-27B-DFlash

z-lab / Qwen3.6-27B-DFlash

2B
Qwentext-generation10 benchmarks total
transformerssafetensorsqwen3feature-extraction
Best215 tok/s
Median39.2 tok/s
Min26.9 tok/s
View benchmarks →

Ling-2.6-flash

inclusionAI / Ling-2.6-flash

107B
text-generation8 benchmarks total
safetensorsbailing_hybridtext-generationconversational
Best94.9 tok/s
Median86.2 tok/s
Min82.3 tok/s
View benchmarks →

gemma-4-E4B

google / gemma-4-E4B

8B
Gemmaany-to-any8 benchmarks total
transformerssafetensorsgemma4image-text-to-text
View benchmarks →

Llama-3.1-8B

meta-llama / Llama-3.1-8B

8B
Llamatext-generation8 benchmarks total
transformerssafetensorsllamatext-generation
View benchmarks →

gemma-4-E2B

google / gemma-4-E2B

5B
Gemmaany-to-any6 benchmarks total
transformerssafetensorsgemma4image-text-to-text
View benchmarks →

GLM-4.7-Flash

zai-org / GLM-4.7-Flash

31B
text-generation6 benchmarks total
transformerssafetensorsglm4_moe_litetext-generation
Best93.3 tok/s
Median93.1 tok/s
Min92.9 tok/s
View benchmarks →

Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16

nvidia / Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16

33B
any-to-any5 benchmarks total
transformerssafetensorsNemotronH_Nano_Omni_Reasoning_V3feature-extraction
Best107 tok/s
Median102 tok/s
Min95.8 tok/s
View benchmarks →

Nemotron-Cascade-2-30B-A3B

nvidia / Nemotron-Cascade-2-30B-A3B

32B
text-generation5 benchmarks total
transformerssafetensorsnemotron_htext-generation
Best141 tok/s
Median95.4 tok/s
Min89.8 tok/s
View benchmarks →

NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

nvidia / NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

67B
Opttext-generation5 benchmarks total
transformerssafetensorsnemotron_htext-generation
Best262 tok/s
Median175 tok/s
Min49.6 tok/s
View benchmarks →

GLM-5.1

zai-org / GLM-5.1

754B
text-generation4 benchmarks total
transformerssafetensorsglm_moe_dsatext-generation
View benchmarks →

Mistral-Medium-3.5-128B

mistralai / Mistral-Medium-3.5-128B

128B
Mistral4 benchmarks total
safetensorsmistral3vLLMen
Best7.4 tok/s
Median6.7 tok/s
Min6.2 tok/s
View benchmarks →

gpt-oss-20b

openai / gpt-oss-20b

22B
Gpttext-generation4 benchmarks total
transformerssafetensorsgpt_osstext-generation
Best991 tok/s
Median160 tok/s
Min48.3 tok/s
View benchmarks →

gpt-oss-120b

openai / gpt-oss-120b

120B
Gpttext-generation4 benchmarks total
transformerssafetensorsgpt_osstext-generation
Best71.5 tok/s
Median70.4 tok/s
Min62.7 tok/s
View benchmarks →

Qwen2.5-72B

Qwen / Qwen2.5-72B

73B
Qwentext-generation4 benchmarks total
transformerssafetensorsqwen2text-generation
View benchmarks →

Qwen3-8B-Base

Qwen / Qwen3-8B-Base

8B
Qwentext-generation4 benchmarks total
transformerssafetensorsqwen3text-generation
View benchmarks →

Gemopus-4-26B-A4B-it

Jackrong / Gemopus-4-26B-A4B-it

27B
Gemmatext-generation4 benchmarks total
safetensorsgemma4gemmainstruction-tuned
Best64.3 tok/s
Median55.0 tok/s
Min45.7 tok/s
View benchmarks →

NVIDIA-Nemotron-3-Super-120B-A12B-BF16

nvidia / NVIDIA-Nemotron-3-Super-120B-A12B-BF16

124B
text-generation3 benchmarks total
transformerssafetensorsnemotron_htext-generation
View benchmarks →

Qwen3-32B

Qwen / Qwen3-32B

32B
Qwentext-generation3 benchmarks total
transformerssafetensorsqwen3text-generation
Best79.3 tok/s
Median22.9 tok/s
Min22.8 tok/s
View benchmarks →

MiniMax-M2.5

MiniMaxAI / MiniMax-M2.5

229B
Minimaxtext-generation3 benchmarks total
transformerssafetensorsminimax_m2text-generation
Best504 tok/s
Median419 tok/s
Min334 tok/s
View benchmarks →

Qwen3.5-0.8B-Base

Qwen / Qwen3.5-0.8B-Base

1B
Qwenimage-text-to-text3 benchmarks total
transformerssafetensorsqwen3_5image-text-to-text
Best2.7k tok/s
Median2.7k tok/s
Min2.7k tok/s
View benchmarks →

Qwen2.5-7B

Qwen / Qwen2.5-7B

8B
Qwentext-generation3 benchmarks total
transformerssafetensorsqwen2text-generation
View benchmarks →

Kimi-K2.5

moonshotai / Kimi-K2.5

1.1T
image-text-to-text2 benchmarks total
transformerssafetensorskimi_k25feature-extraction
Best74.0 tok/s
Median74.0 tok/s
Min74.0 tok/s
View benchmarks →

granite-4.1-30b

ibm-granite / granite-4.1-30b

29B
text-generation2 benchmarks total
transformerssafetensorsgranitetext-generation
Best17.9 tok/s
Median17.6 tok/s
Min17.3 tok/s
View benchmarks →

Mistral-Small-3.1-24B-Base-2503

mistralai / Mistral-Small-3.1-24B-Base-2503

24B
Mistral2 benchmarks total
vllmsafetensorsmistral3mistral-common
View benchmarks →

MiniMax-M2.1

MiniMaxAI / MiniMax-M2.1

229B
Minimaxtext-generation2 benchmarks total
transformerssafetensorsminimax_m2text-generation
Best499 tok/s
Median416 tok/s
Min333 tok/s
View benchmarks →

MiniMax-M2

MiniMaxAI / MiniMax-M2

229B
Minimaxtext-generation2 benchmarks total
transformerssafetensorsminimax_m2text-generation
Best493 tok/s
Median398 tok/s
Min303 tok/s
View benchmarks →

Qwen3-VL-30B-A3B-Instruct

Qwen / Qwen3-VL-30B-A3B-Instruct

30B
Qwenimage-text-to-text2 benchmarks total
transformerssafetensorsqwen3_vl_moeimage-text-to-text
Best56.6 tok/s
Median52.2 tok/s
Min47.7 tok/s
View benchmarks →

Ministral-3-3B-Base-2512

mistralai / Ministral-3-3B-Base-2512

4B
Mistral2 benchmarks total
vllmsafetensorsmistral3mistral-common
View benchmarks →

Llama-3.1-70B

meta-llama / Llama-3.1-70B

71B
Llamatext-generation2 benchmarks total
transformerssafetensorsllamatext-generation
View benchmarks →

Qwen3.5-122B-A10B-GPTQ-Int4

Qwen / Qwen3.5-122B-A10B-GPTQ-Int4

125B
Qwenimage-text-to-text1 benchmark total
transformerssafetensorsqwen3_5_moeimage-text-to-text
Best49.1 tok/s
Median49.1 tok/s
Min49.1 tok/s
View benchmarks →

Llama-2-7b

meta-llama / Llama-2-7b

7B
Llamatext-generation1 benchmark total
facebookmetapytorchllama
Best110 tok/s
Median110 tok/s
Min110 tok/s
View benchmarks →

Qwen2.5-32B

Qwen / Qwen2.5-32B

33B
Qwentext-generation1 benchmark total
safetensorsqwen2text-generationconversational
View benchmarks →

Qwen3-VL-8B-Instruct

Qwen / Qwen3-VL-8B-Instruct

9B
Qwenimage-text-to-text1 benchmark total
transformerssafetensorsqwen3_vlimage-text-to-text
Best95.9 tok/s
Median95.9 tok/s
Min95.9 tok/s
View benchmarks →

Llama-3.2-3B-Instruct

meta-llama / Llama-3.2-3B-Instruct

3B
Llamatext-generation1 benchmark total
transformerssafetensorsllamatext-generation
Best79.9 tok/s
Median79.9 tok/s
Min79.9 tok/s
View benchmarks →

Qwen3-30B-A3B-Base

Qwen / Qwen3-30B-A3B-Base

31B
Qwentext-generation1 benchmark total
transformerssafetensorsqwen3_moetext-generation
View benchmarks →

Ternary-Bonsai-8B-unpacked

prism-ml / Ternary-Bonsai-8B-unpacked

8B
Qwen1 benchmark total
safetensorsqwen3prismmlbonsai
View benchmarks →

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

nvidia / NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

32B
text-generation1 benchmark total
transformerssafetensorsnemotron_htext-generation
Best286 tok/s
Median286 tok/s
Min286 tok/s
View benchmarks →

Qwen3.5-35B-A3B-4bit

mlx-community / Qwen3.5-35B-A3B-4bit

6B
Qwenimage-text-to-text1 benchmark total
transformerssafetensorsqwen3_5_moeimage-text-to-text
Best105 tok/s
Median105 tok/s
Min105 tok/s
View benchmarks →

gemma-3-4b-pt

google / gemma-3-4b-pt

4B
Gemmaimage-text-to-text1 benchmark total
transformerssafetensorsgemma3image-text-to-text
View benchmarks →

GLM-5

zai-org / GLM-5

754B
text-generation1 benchmark total
transformerssafetensorsglm_moe_dsatext-generation
View benchmarks →

DeepSeek-V4-Flash-2bit-DQ

mlx-community / DeepSeek-V4-Flash-2bit-DQ

284B
Deepseektext-generation1 benchmark total
mlxsafetensorsdeepseek_v4text-generation
Best17.0 tok/s
Median17.0 tok/s
Min17.0 tok/s
View benchmarks →

Qwen3-VL-2B-Instruct

Qwen / Qwen3-VL-2B-Instruct

2B
Qwenimage-text-to-text1 benchmark total
transformerssafetensorsqwen3_vlimage-text-to-text
Best27.9 tok/s
Median27.9 tok/s
Min27.9 tok/s
View benchmarks →

Qwen3-30B-A3B-Instruct-2507

Qwen / Qwen3-30B-A3B-Instruct-2507

30B
Qwentext-generation1 benchmark total
transformerssafetensorsqwen3_moetext-generation
View benchmarks →

Gemopus-4-26B-A4B-it-GGUF

Jackrong / Gemopus-4-26B-A4B-it-GGUF

26B
Gemmatext-generation1 benchmark total
ggufgemma4gemmainstruction-tuned
Best94.5 tok/s
Median94.5 tok/s
Min94.5 tok/s
View benchmarks →

LFM2-24B-A2B

LiquidAI / LFM2-24B-A2B

24B
text-generation
transformerssafetensorslfm2_moetext-generation
View benchmarks →

LFM2-24B-A2B-GGUF

lmstudio-community / LFM2-24B-A2B-GGUF

24B
ggufendpoints_compatibleregion:usconversational
View benchmarks →