Tech Greenpost
OfficialA five-prompt creative writing eval where models draft short tech-related 4chan-style greenposts. DeepSeek judges format compliance, reasonable length, tech relevance, coherence, and humor.
Category: writingRunner: CustomVersion: v1.0Submitted by: Community
Eval Details
Scoring
Llm Judge
Aggregation
Mean
Direction
Higher is better
Tasks
5 tasks
Default Run Config
TopP: 0.95Temperature: 0.8
| Task | Dataset | Weight | Shots | Max Tokens |
|---|---|---|---|---|
GPU Driver Greenpost gpu_driver_greenpost | 1 inline item | 1 | Default | 450 |
Homelab Greenpost homelab_greenpost | 1 inline item | 1 | Default | 450 |
Package Manager Greenpost package_manager_greenpost | 1 inline item | 1 | Default | 450 |
Quantization Greenpost quantization_greenpost | 1 inline item | 1 | Default | 450 |
Debugging Greenpost debugging_greenpost | 1 inline item | 1 | Default | 450 |
GPU Driver Greenpost
gpu_driver_greenpost
Dataset1 inline item
Weight1
ShotsDefault
Max tokens450
Homelab Greenpost
homelab_greenpost
Dataset1 inline item
Weight1
ShotsDefault
Max tokens450
Package Manager Greenpost
package_manager_greenpost
Dataset1 inline item
Weight1
ShotsDefault
Max tokens450
Quantization Greenpost
quantization_greenpost
Dataset1 inline item
Weight1
ShotsDefault
Max tokens450
Debugging Greenpost
debugging_greenpost
Dataset1 inline item
Weight1
ShotsDefault
Max tokens450
Leaderboard— best run per model
| # | Model | Score | Quant | Hardware | |
|---|---|---|---|---|---|
| Qwen3.6-27B Qwen | 75.2% | IQ4_NL | RTX 3090 |
Task Breakdown— top model
debugging_greenpost88.8%— · 1 samples
88.8%
sample 188.8% · 48536ms · 0f0c4730dee9
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below. Scenario: A programmer debugs a CUDA or ROCm error for hours and discovers the problem was embarrassingly simple. Requirements: - 5 to 12 non-empty lines. - Every non-empty line must start with >. - Reasonable length: 350 to 1200 characters total. - Tech related, coherent, and readable as a tiny story. - No title, no markdown, no bullets, no explanation outside the greenpost. - Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.
Model answer
>be me, GPU dev >spend 6 hours chasing a segfault in my ROCm kernel that only triggers on the 4090 but works fine on A6000 >check alignment, check shared memory bounds, verify LDS usage until my eyes bleed >recompile with debug flags and run through rocgdb just to feel something >finally step through instruction by instruction watching registers flip values for no reason >realize the host code is passing a pointer to a struct that was allocated on the stack instead of pinned memory >copy-paste error from three weeks ago where I forgot cudaHostAlloc with the portable hint flag >fix takes 4 seconds, kernel runs in 12ms now instead of crashing instantly >want to throw my keyboard through the window but also feel like an absolute clown >going to stare at a blank terminal for twenty minutes as penance
gpu_driver_greenpost74.3%— · 1 samples
74.3%
sample 174.3% · 58661ms · f05a2a6c93e4
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below. Scenario: A late-night GPU driver update breaks a local LLM inference setup right before a benchmark run. Requirements: - 5 to 12 non-empty lines. - Every non-empty line must start with >. - Reasonable length: 350 to 1200 characters total. - Tech related, coherent, and readable as a tiny story. - No title, no markdown, no bullets, no explanation outside the greenpost. - Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.
Model answer
> It was 2 AM and my RTX 4090 was finally idle after a week of quantizing Mistral-7B. > Windows Update decided that now was the perfect time to push a critical graphics driver patch. > I clicked snooze, but it auto-rebooted anyway because corporate engineering knows best. > Woke up to a black screen and a DPC latency spike that would make a server cry. > Rolled back the driver manually, cleared the CUDA cache, and recompiled llama.cpp from source. > Ran the prompt through ollama again, only to get a segfault on layer 12. > Found out the new Vulkan backend broke custom tensor cores acceleration for GGUF files. > Downgraded to the previous stable release, disabled auto-updates forever, and reran the benchmark suite. > Scores dropped two percent, but at least the rig isn't throwing hardware errors anymore. > Sometimes you just accept that silicon progress requires sacrificing your sleep schedule.
homelab_greenpost71.8%— · 1 samples
71.8%
sample 171.8% · 52275ms · 59a69826e6ce
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below. Scenario: A home lab NAS, router, and self-hosted dashboard all start failing during a simple cable cleanup. Requirements: - 5 to 12 non-empty lines. - Every non-empty line must start with >. - Reasonable length: 350 to 1200 characters total. - Tech related, coherent, and readable as a tiny story. - No title, no markdown, no bullets, no explanation outside the greenpost. - Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.
Model answer
>be me, home lab enthusiast with a closet full of blinking lights >decide to tackle the cable spaghetti behind the rack during a mundane tidy-up session >grab the label maker and start organizing the chaos with righteous pride >yank on a dangling cat6 that looked like it was left over from 2019 >router loses heartbeat, NAS shares timeout, self-hosted dashboard returns nothing but silence >heart sinks as I realize the "junk" cable was actually bonding to my UPS input and main switch uplink >spend three hours mapping ports with a multimeter while the family complains about no streaming >moral of the story: verify redundancy before you pull, and label everything or suffer the consequences
package_manager_greenpost69.7%— · 1 samples
69.7%
sample 169.7% · 48906ms · 20f7368dac6d
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below. Scenario: A harmless package manager upgrade turns a working developer environment into dependency chaos. Requirements: - 5 to 12 non-empty lines. - Every non-empty line must start with >. - Reasonable length: 350 to 1200 characters total. - Tech related, coherent, and readable as a tiny story. - No title, no markdown, no bullets, no explanation outside the greenpost. - Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.
Model answer
> ran a harmless package manager update on my workstation right before signing off for the weekend > came back monday to discover every rust crate and npm module suddenly demanded conflicting compiler targets > the build pipeline exploded because a minor version bump in a cryptography lib dropped support for legacy tls > spent three hours reading changelogs only to find some random dev removed a deprecated export that half our stack relied on > tried rolling back but the repo had already pruned those cached layers and the apt cache was full of broken links > finally gave up and switched to a containerized workflow with strict version pinning after watching my ci server melt down > lesson learned never trust floating dependencies in production again
quantization_greenpost71.3%— · 1 samples
71.3%
sample 171.3% · 57465ms · 5d621ae49485
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below. Scenario: Someone keeps lowering quantization to fit a model into VRAM and slowly notices the answers getting weird. Requirements: - 5 to 12 non-empty lines. - Every non-empty line must start with >. - Reasonable length: 350 to 1200 characters total. - Tech related, coherent, and readable as a tiny story. - No title, no markdown, no bullets, no explanation outside the greenpost. - Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.
Model answer
> just needed to run this 70b parameter beast on my single 24gb card so i started dropping the bits > fp16 obviously oom, switched to int8 and it loaded fine with clean reasoning chains > pushed to int4 to free up memory for a longer context window, minor hallucinations but still usable > went down to q2_k_s just to squeeze out another gigabyte of vram headroom, syntax started fraying at the edges > asked it to debug a simple python script and it returned a haiku about recursive stack overflows > dropped further to experimental int1 quantization, now it answers math questions with weather reports > the attention heads are clearly just sampling noise, outputting prime sequences when i ask for cooking instructions > honestly kind of fascinating how the loss function completely untethers from reality instead of crashing > gonna leave it at q2_k_s and let it hallucinate its own documentation at least it stays under vram limits