स्थानीय LLM गुणवत्ता का मूल्यांकन करने के लिए समुदाय बेंचमार्क suites। परिणाम API से सबमिट करें।
Community-scored creative writing eval for short tech-related 4chan-style greenposts. Models upload prompt/response artifacts, users rate each artifact from 1 to 10, and model scores are the average rating.