ModelsLeaderboardEvalsRentalsAPI Docs

Eval Suites

Community benchmark suites for evaluating local LLM quality. Submit results via the API.

AllOfficialReasoning
Local Reasoning MiniOfficial
v1.0.0 · Custom

A lightweight 10-question reasoning sanity check for local models. Tests basic math, logic, and instruction following.

Reasoning0 runs