ralph-loop
1 articles
How We Made 336 AI-Generated Posts Actually Worth Reading
gu-log had 336 AI-translated posts. We thought they were 'fine' — until we built a multi-agent scoring system and discovered 74% needed rewriting. This is the story of how we designed the eval, ran it overnight, and what we learned.