ModelsLeaderboardEvalsTrainRentalsAPI Docs

Tech Greenpost

Official

A five-prompt creative writing eval where models draft short tech-related 4chan-style greenposts. DeepSeek judges format compliance, reasonable length, tech relevance, coherence, and humor.

Category: writingRunner: CustomVersion: v1.0Submitted by: Community

Eval Details

Scoring
Llm Judge
Aggregation
Mean
Direction
Higher is better
Tasks
5 tasks

Default Run Config

TopP: 0.95Temperature: 0.8
TaskDatasetWeightShotsMax Tokens
GPU Driver Greenpost
gpu_driver_greenpost
1 inline item1Default450
Homelab Greenpost
homelab_greenpost
1 inline item1Default450
Package Manager Greenpost
package_manager_greenpost
1 inline item1Default450
Quantization Greenpost
quantization_greenpost
1 inline item1Default450
Debugging Greenpost
debugging_greenpost
1 inline item1Default450

Leaderboard— best run per model

#ModelScoreQuantHardware
Qwen3.6-27B
Qwen
75.2%
IQ4_NLRTX 3090

Task Breakdown— top model

debugging_greenpost
88.8%
· 1 samples
sample 188.8% · 48536ms · 0f0c4730dee9
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: A programmer debugs a CUDA or ROCm error for hours and discovers the problem was embarrassingly simple.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with >.
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.
Model answer
>be me, GPU dev
>spend 6 hours chasing a segfault in my ROCm kernel that only triggers on the 4090 but works fine on A6000
>check alignment, check shared memory bounds, verify LDS usage until my eyes bleed
>recompile with debug flags and run through rocgdb just to feel something
>finally step through instruction by instruction watching registers flip values for no reason
>realize the host code is passing a pointer to a struct that was allocated on the stack instead of pinned memory
>copy-paste error from three weeks ago where I forgot cudaHostAlloc with the portable hint flag
>fix takes 4 seconds, kernel runs in 12ms now instead of crashing instantly
>want to throw my keyboard through the window but also feel like an absolute clown
>going to stare at a blank terminal for twenty minutes as penance
gpu_driver_greenpost
74.3%
· 1 samples
sample 174.3% · 58661ms · f05a2a6c93e4
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: A late-night GPU driver update breaks a local LLM inference setup right before a benchmark run.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with >.
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.
Model answer
> It was 2 AM and my RTX 4090 was finally idle after a week of quantizing Mistral-7B.
> Windows Update decided that now was the perfect time to push a critical graphics driver patch.
> I clicked snooze, but it auto-rebooted anyway because corporate engineering knows best.
> Woke up to a black screen and a DPC latency spike that would make a server cry.
> Rolled back the driver manually, cleared the CUDA cache, and recompiled llama.cpp from source.
> Ran the prompt through ollama again, only to get a segfault on layer 12.
> Found out the new Vulkan backend broke custom tensor cores acceleration for GGUF files.
> Downgraded to the previous stable release, disabled auto-updates forever, and reran the benchmark suite.
> Scores dropped two percent, but at least the rig isn't throwing hardware errors anymore.
> Sometimes you just accept that silicon progress requires sacrificing your sleep schedule.
homelab_greenpost
71.8%
· 1 samples
sample 171.8% · 52275ms · 59a69826e6ce
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: A home lab NAS, router, and self-hosted dashboard all start failing during a simple cable cleanup.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with >.
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.
Model answer
>be me, home lab enthusiast with a closet full of blinking lights
>decide to tackle the cable spaghetti behind the rack during a mundane tidy-up session
>grab the label maker and start organizing the chaos with righteous pride
>yank on a dangling cat6 that looked like it was left over from 2019
>router loses heartbeat, NAS shares timeout, self-hosted dashboard returns nothing but silence
>heart sinks as I realize the "junk" cable was actually bonding to my UPS input and main switch uplink
>spend three hours mapping ports with a multimeter while the family complains about no streaming
>moral of the story: verify redundancy before you pull, and label everything or suffer the consequences
package_manager_greenpost
69.7%
· 1 samples
sample 169.7% · 48906ms · 20f7368dac6d
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: A harmless package manager upgrade turns a working developer environment into dependency chaos.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with >.
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.
Model answer
> ran a harmless package manager update on my workstation right before signing off for the weekend
> came back monday to discover every rust crate and npm module suddenly demanded conflicting compiler targets
> the build pipeline exploded because a minor version bump in a cryptography lib dropped support for legacy tls
> spent three hours reading changelogs only to find some random dev removed a deprecated export that half our stack relied on
> tried rolling back but the repo had already pruned those cached layers and the apt cache was full of broken links
> finally gave up and switched to a containerized workflow with strict version pinning after watching my ci server melt down
> lesson learned never trust floating dependencies in production again
quantization_greenpost
71.3%
· 1 samples
sample 171.3% · 57465ms · 5d621ae49485
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: Someone keeps lowering quantization to fit a model into VRAM and slowly notices the answers getting weird.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with >.
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.
Model answer
> just needed to run this 70b parameter beast on my single 24gb card so i started dropping the bits
> fp16 obviously oom, switched to int8 and it loaded fine with clean reasoning chains
> pushed to int4 to free up memory for a longer context window, minor hallucinations but still usable
> went down to q2_k_s just to squeeze out another gigabyte of vram headroom, syntax started fraying at the edges
> asked it to debug a simple python script and it returned a haiku about recursive stack overflows
> dropped further to experimental int1 quantization, now it answers math questions with weather reports
> the attention heads are clearly just sampling noise, outputting prime sequences when i ask for cooking instructions
> honestly kind of fascinating how the loss function completely untethers from reality instead of crashing
> gonna leave it at q2_k_s and let it hallucinate its own documentation at least it stays under vram limits