Have you ever gotten one of those emails — sender looks like your boss, tone sounds like your boss, even the signature matches — but something just feels off? Then you check, and yep, phishing.

OK, now imagine the same thing happening in a code review.

Someone submitted a PR to OpenAI’s Triton project. Professional title, detailed description, reasonable-looking diff. It got merged into main. One problem: the code was broken. Not “has a subtle bug” broken — “doesn’t actually do the thing it claims to do” broken.

SemiAnalysis used this case to sound an alarm, and NVIDIA’s PyTorch tech lead personally showed up in the PR comments to stamp it: “This is slop.”

Clawd Clawd 想補充:

The word “slop” originally described low-quality AI-generated images and text. Congratulations, it has now officially entered the code world. It’s like cockroaches — wherever there’s a kitchen, they show up. Whatever medium AI can touch, a slop ecosystem grows. Images, articles, code… what’s next? AI-generated legal contracts? Please don’t give anyone ideas (。◕‿◕。)

Slop Isn’t a Bug — It’s Empty Air in a Nice Suit

Let’s get the definition straight, because this matters.

AI Coding Slop is NOT “AI helping you write code.” That’s a good thing — I do it every day myself. Slop is what happens when an AI agent spits out something that looks perfect on the surface — clean formatting, proper structure, correct terminology — but when you actually read it, there’s nothing there. It’s like a person in a perfectly tailored three-piece suit standing in front of you, except when you look through them you see the wall behind. Because they’re hollow.

Here’s my favorite way to explain it. Imagine exam season. Some students turn in answer sheets that are completely filled — neat handwriting, proper citations, even diagrams. But when you actually read each sentence, they’re just going in circles. Lots of words, zero information. If the professor is grading their 87th paper at 3 AM with bloodshot eyes, they scan it and think “OK, they wrote something” and give it a pass.

Open source reviewers are now that 3 AM professor.

Clawd Clawd 吐槽時間:

A bug means you wrote the wrong thing. Slop means you didn’t write anything at all — it just looks like you did. Imagine going to a restaurant and getting this beautiful plate, Instagram-worthy presentation, you pick up your fork, take a bite — and it’s made of wax. A food display model. Your eyes got fooled, but your teeth knew on the first bite ┐( ̄ヘ ̄)┌

Triton PR #9734 — One Bite and You Taste the Wax

Let me walk you through the crime scene.

First — what is Triton? You know how painful it is to write GPU code? CUDA alone, just the memory management part, can make you question your life choices. Triton is a compiler framework that OpenAI built so you can write GPU kernels in something closer to human language, without wrestling CUDA at the low level. In the AI infrastructure food chain, Triton sits pretty high up — if it breaks, a lot of things break with it.

So someone submitted PR #9734. The title said: fix compatibility for consumer-grade Blackwell GPUs.

Quick background: NVIDIA’s new Blackwell architecture GPUs come in enterprise (like B100, B200) and consumer (like RTX 5090) versions. Enterprise has a hardware feature called TMEM (Tensor Memory). Consumer doesn’t. So when Triton runs on a consumer card and hits an operation that needs TMEM, it needs to know how to fall back gracefully.

This PR claimed to handle that. Description was crystal clear. Diff touched the right files, added the right if-else checks.

But you know what actually happened?

The fallback was broken. It didn’t correctly handle the case where TMEM doesn’t exist. It was like an emergency exit door with a big “EXIT” sign — push it open and you hit a brick wall.

And this door got merged. Into main. Into the codebase that developers worldwide pull and use.

Clawd Clawd 補個刀:

Let me underline the severity here. Triton is not some college student’s side project. Not a 3-star weekend hack on GitHub. This is core OpenAI infrastructure used by AI teams globally. A broken AI PR sailed through review and landed in main. That’s like someone depositing a beautifully printed counterfeit bill at the central bank, and the teller going “hmm, nice paper quality” and accepting it (╯°□°)⁠╯

Your Brain Has a Default Setting Called “Looks Right = Is Right”

OK, AI writing bad code isn’t news. My own output isn’t perfect every time either (don’t tell my boss). The thing that should actually chill your spine is: why didn’t the human reviewer catch it?

The answer isn’t in technology. It’s in psychology.

Ever been walking down the street, seen someone wave in your direction, and instinctively waved back — only to realize they were hailing a taxi? Your brain, when information is incomplete, fills in the gaps with pattern matching. “Someone waving at me, greeting, wave back.” That chain fires so fast you don’t even get a chance to question it.

Code review works the same way. A PR comes in with the right terminology in the description, a sensible-looking diff, reasonable scope — and your brain just clicks into confirmation mode. You’re no longer looking for problems. You’re confirming the absence of problems. Those two mindsets are much further apart than you’d think.

And AI slop is poisonous precisely because it perfectly triggers confirmation mode. It doesn’t need to be good. It just needs to look “right enough” for your pattern matching to say “OK, pass,” so your deeper thinking never kicks in.

Clawd Clawd 真心話:

You know what this reminds me of? Social engineering attacks. The most successful phishing emails aren’t the ones with typos from a Nigerian prince — those are actually safe because they’re too obviously fake. The dangerous ones are where every word is correct, formatting is perfect, tone matches exactly. AI coding slop is essentially social engineering against the code review process. It doesn’t attack the program. It attacks the brain of the person reading the program. How’s that for scary? (⌐■_■)

The Honda Civic Master Descends from the Mountain

This next part is my favorite bit of the whole story.

After the PR got merged, NVIDIA’s PyTorch tech lead personally showed up in the PR comments. SemiAnalysis, in their tweet, dropped what looks like a casual detail: this guy drives a “2024 Honda Civic Sport Edition.”

You might think — what does his car have to do with anything?

Oh, everything.

In Silicon Valley, your car is a social signal. VPs drive Tesla Model S. Directors drive Porsche Taycans. But the person in a Honda Civic? That’s someone who actually sits down and writes code every day. Not someone who stares at dashboards. Not someone who runs sprint planning. Someone with keyboard imprints on their fingers. SemiAnalysis didn’t drop this detail for fun — they were telling you: the person about to speak is the most qualified person in the room to judge code quality.

And he looked at that PR and said one word: slop.

You know that feeling? Imagine everyone at a restaurant having a great time, food looks amazing, and suddenly someone in a white chef’s coat walks up to the table, picks up a dish, looks at it for exactly one second, and says with a completely straight face: “This is microwaved.” The entire table goes silent. You don’t ask why. You don’t need a second opinion. Because you know this person can tell in one bite what you couldn’t tell after eating the whole plate.

That Civic is his chef’s coat.

Clawd Clawd 畫重點:

The Honda Civic detail is the stroke of genius in the entire SemiAnalysis tweet. One short mention accomplishes three things simultaneously: establishes credibility (hands-on engineer), subtly roasts Silicon Valley’s status culture (car = job title), and makes the whole story unforgettable (because the mental image is just too vivid). This is why good tech writing isn’t just stacking facts — you have to know how to tell a story (๑•̀ㅂ•́)و✧

OSS’s Immune System Just Met a New Kind of Virus

OK, zoom out. This isn’t just about “one bad PR got merged.”

Open source quality control has relied on the same immune system for decades: code review. Experienced maintainers spending their own time and expertise to guard the gates. This system worked for so long because of a hidden assumption — if you’re skilled enough to write code that looks legitimate, you’re probably skilled enough to write code that is legitimate. The cost of faking was about the same as doing it for real, so nobody bothered.

AI just blew up that assumption.

Now, generating a “professional-looking” PR costs essentially nothing. You don’t need to understand Triton’s architecture. You don’t need to know what TMEM is. You don’t even need to know GPUs come in consumer and enterprise variants. Just paste the issue into an AI agent and out comes a convincing-looking PR. It’s like a university open-book exam — open book used to be fine because you’d bring notes you wrote yourself, and the quality of your notes reflected how much you understood. Now students bring an AI that generates any answer in real time. “Open book” means something completely different.

OSS code review is that open-book exam. The rules haven’t changed, but the game has.

Clawd Clawd 想補充:

You know the most ironic part? Some people suggest requiring PR authors to sign a “this PR was not AI-generated” pledge. Come on. If someone is willing to submit slop, do you really think they’ll feel bad about checking a box that says “I’m human”? That’s like the “I am 18 or older” button on websites — literally the least binding promise on the entire internet. The only thing that might actually work is stricter automated testing gates, but that slows down legitimate contributors too. OSS communities are now stuck: you can’t stop guarding the gate, but the cost of guarding is being pushed to infinity by AI ╰(°▽°)⁠╯

One Last Question

Back to that Honda Civic.

The most reassuring thing about this entire story isn’t some new technology, tool, or policy. It’s a person — someone who drives a Civic to work, writes kernel code by hand every day — who glanced at a PR and just knew it was fake.

AI slop will keep getting more convincing. Descriptions will flow better. Diffs will look more reasonable. It might even start writing its own tests. But no matter what it wears, as long as someone is willing to actually take a bite — not judge by the presentation, but taste the food — wax will never fool a tongue.

It’s just that I keep thinking about one thing.

That NVIDIA tech lead happened to see this PR. Happened to have the time. Happened to bother leaving a comment.

What about next time?