LocalMaxxing
Models
Leaderboard
Evals
Train
Rentals
API
Submit
Models
Leaderboard
Evals
Train
Rentals
API Docs
Evals
Probe
Probe
Category:
knowledge
Runner:
lm-eval-harness
Version:
v1.0
Submitted by:
banana_baeee
Eval Details
Scoring
Exact Match
Aggregation
Weighted Mean
Direction
Higher is better
Tasks
1 task
Task
Dataset
Weight
Shots
Max Tokens
mmlu
mmlu
Not specified
1
Default
—
mmlu
mmlu
Dataset
Not specified
Weight
1
Shots
Default
Max tokens
—
Leaderboard
— best run per model
#
Model
Score
Quant
Hardware
Ornstein3.6-35B-A3B
Qwen
81.9%
Q4_K_M
RTX 3090
🔥
🚀
💯
🧠
💪
Ornstein3.6-35B-A3B
Qwen
🔥
🚀
💯
🧠
💪
81.9%
Quant
Q4_K_M
Hardware
RTX 3090
Task Breakdown
— top model
mmlu
81.9%
0-shot · 0 samples