Tech Greenpost

Official

A five-prompt creative writing eval where models draft short tech-related 4chan-style greenposts. DeepSeek judges format compliance, reasonable length, tech relevance, coherence, and humor.

Category: writingRunner: CustomVersion: v1.0Submitted by: Community

Eval Details

Scoring

Llm Judge

Aggregation

Mean

Direction

Higher is better

Tasks

5 tasks

Default Run Config

TopP: 0.95Temperature: 0.8

Task	Dataset	Weight	Shots	Max Tokens
GPU Driver Greenpost gpu_driver_greenpost	1 inline item	1	Default	450
Homelab Greenpost homelab_greenpost	1 inline item	1	Default	450
Package Manager Greenpost package_manager_greenpost	1 inline item	1	Default	450
Quantization Greenpost quantization_greenpost	1 inline item	1	Default	450
Debugging Greenpost debugging_greenpost	1 inline item	1	Default	450

Leaderboard— best run per model

#	Model	Score	Quant	Hardware
	Qwen3.6-27B Qwen	75.2%	IQ4_NL	RTX 3090

Task Breakdown— top model

debugging_greenpost

88.8%

— · 1 samples

sample 188.8% · 48536ms · 0f0c4730dee9

Question

Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: A programmer debugs a CUDA or ROCm error for hours and discovers the problem was embarrassingly simple.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with >.
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.

Model answer

>be me, GPU dev
>spend 6 hours chasing a segfault in my ROCm kernel that only triggers on the 4090 but works fine on A6000
>check alignment, check shared memory bounds, verify LDS usage until my eyes bleed
>recompile with debug flags and run through rocgdb just to feel something
>finally step through instruction by instruction watching registers flip values for no reason
>realize the host code is passing a pointer to a struct that was allocated on the stack instead of pinned memory
>copy-paste error from three weeks ago where I forgot cudaHostAlloc with the portable hint flag
>fix takes 4 seconds, kernel runs in 12ms now instead of crashing instantly
>want to throw my keyboard through the window but also feel like an absolute clown
>going to stare at a blank terminal for twenty minutes as penance

gpu_driver_greenpost

74.3%

— · 1 samples

sample 174.3% · 58661ms · f05a2a6c93e4

Question

Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: A late-night GPU driver update breaks a local LLM inference setup right before a benchmark run.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with >.
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.

Model answer

> It was 2 AM and my RTX 4090 was finally idle after a week of quantizing Mistral-7B.
> Windows Update decided that now was the perfect time to push a critical graphics driver patch.
> I clicked snooze, but it auto-rebooted anyway because corporate engineering knows best.
> Woke up to a black screen and a DPC latency spike that would make a server cry.
> Rolled back the driver manually, cleared the CUDA cache, and recompiled llama.cpp from source.
> Ran the prompt through ollama again, only to get a segfault on layer 12.
> Found out the new Vulkan backend broke custom tensor cores acceleration for GGUF files.
> Downgraded to the previous stable release, disabled auto-updates forever, and reran the benchmark suite.
> Scores dropped two percent, but at least the rig isn't throwing hardware errors anymore.
> Sometimes you just accept that silicon progress requires sacrificing your sleep schedule.

homelab_greenpost

71.8%

— · 1 samples

sample 171.8% · 52275ms · 59a69826e6ce

Question

Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: A home lab NAS, router, and self-hosted dashboard all start failing during a simple cable cleanup.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with >.
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.

Model answer

>be me, home lab enthusiast with a closet full of blinking lights
>decide to tackle the cable spaghetti behind the rack during a mundane tidy-up session
>grab the label maker and start organizing the chaos with righteous pride
>yank on a dangling cat6 that looked like it was left over from 2019
>router loses heartbeat, NAS shares timeout, self-hosted dashboard returns nothing but silence
>heart sinks as I realize the "junk" cable was actually bonding to my UPS input and main switch uplink
>spend three hours mapping ports with a multimeter while the family complains about no streaming
>moral of the story: verify redundancy before you pull, and label everything or suffer the consequences

package_manager_greenpost

69.7%

— · 1 samples

sample 169.7% · 48906ms · 20f7368dac6d

Question

Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: A harmless package manager upgrade turns a working developer environment into dependency chaos.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with >.
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.

Model answer

> ran a harmless package manager update on my workstation right before signing off for the weekend
> came back monday to discover every rust crate and npm module suddenly demanded conflicting compiler targets
> the build pipeline exploded because a minor version bump in a cryptography lib dropped support for legacy tls
> spent three hours reading changelogs only to find some random dev removed a deprecated export that half our stack relied on
> tried rolling back but the repo had already pruned those cached layers and the apt cache was full of broken links
> finally gave up and switched to a containerized workflow with strict version pinning after watching my ci server melt down
> lesson learned never trust floating dependencies in production again

quantization_greenpost

71.3%

— · 1 samples

sample 171.3% · 57465ms · 5d621ae49485

Question

Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: Someone keeps lowering quantization to fit a model into VRAM and slowly notices the answers getting weird.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with >.
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.

Model answer

> just needed to run this 70b parameter beast on my single 24gb card so i started dropping the bits
> fp16 obviously oom, switched to int8 and it loaded fine with clean reasoning chains
> pushed to int4 to free up memory for a longer context window, minor hallucinations but still usable
> went down to q2_k_s just to squeeze out another gigabyte of vram headroom, syntax started fraying at the edges
> asked it to debug a simple python script and it returned a haiku about recursive stack overflows
> dropped further to experimental int1 quantization, now it answers math questions with weather reports
> the attention heads are clearly just sampling noise, outputting prime sequences when i ask for cooking instructions
> honestly kind of fascinating how the loss function completely untethers from reality instead of crashing
> gonna leave it at q2_k_s and let it hallucinate its own documentation at least it stays under vram limits