Eval Suites

Community benchmark suites for evaluating local LLM quality. Submit results via the API.

A lightweight 10-question reasoning sanity check for local models. Tests basic math, logic, and instruction following.