Community benchmark suites for evaluating local LLM quality. Submit results via the API.
Competition math problems spanning algebra, counting, geometry, intermediate algebra, number theory, prealgebra, and precalculus.
Grade School Math 8K — 8,500 grade-school math word problems requiring multi-step arithmetic reasoning. Standard benchmark for math reasoning capability.