Community benchmark suites for evaluating local LLM quality. Submit results via the API.
A lightweight 10-question reasoning sanity check for local models. Tests basic math, logic, and instruction following.