用于评估本地LLM质量的社区基准测试套件。通过API提交结果。
A lightweight 10-question sanity check for locally served models. Designed for the trusted /api/evals/execute path.