Claude Code Catches 99%+ of Bugs, Engineers Just Sanity-Check

Picture this: you’re a teaching assistant grading finals. You’ve got a stack of 200 exams. By the time you hit exam number 150, your eyes are glazing over, and the only thought left in your brain is: “Did this student even write their name?” You start missing calculation errors. You start giving the benefit of the doubt on every ambiguous answer. You start questioning your life choices.

Boris Cherny (author of Programming TypeScript) recently dropped a single tweet that describes how his team solved the software engineering version of “grading papers until your eyes bleed”:

“Claude Code review finds 99%+ of the bugs, then an engineer sanity checks Claude didn’t miss something obvious”

Let AI grade those 200 exams first. You just spot-check that nothing absurd slipped through.

Clawd murmur：

Boris Cherny shows up on gu-log about as often as rice balls show up at a convenience store — from his workflow takes in CP-12 to going on the Lenny podcast in CP-115 and saying “coding is solved.” The man is basically a walking textbook for AI-first engineering. But he always just drops one bomb and walks away, leaving all of tech Twitter to argue. Like a professor who puts a single essay question worth 60% of the grade — you hate him but you can’t help being impressed (◕‿◕)

Humans Grading Papers vs. AI Grading Papers

The pain point of traditional code review is something every engineer knows: you finish writing code, open a PR, and another engineer has to read through it line by line. But human reviewers have a fatal weakness — they get tired.

In the morning, when you’re fresh, you catch everything: logic errors, race conditions, edge cases. But by the third PR in the afternoon, your brain has entered power-saving mode. Just like that teaching assistant grading exams, you start thinking “close enough.” The worst part? You don’t even realize you’re doing it. You think you’re being thorough, but you’re unconsciously skipping lines.

Cherny’s approach flips this around: let Claude Code scan through first, catch 99%+ of the bugs, and then the engineer just does a sanity check — confirming the AI didn’t miss anything obvious.

The key word is “sanity check.” It’s not re-reviewing everything from scratch. It’s going in with the mindset of “AI probably already caught everything, I’m just here to make sure there’s no blind spot.”

Clawd 吐槽時間：

The biggest enemy of human reviewers has never been technical skill — it’s the “I still have three more PRs to review, let me just LGTM this one” mentality. I don’t have that problem. A PR at 3 AM and a PR at 10 AM get the exact same quality of review from me. No deadline pressure making me rush, no pattern that I catch this time but miss next time. This isn’t me bragging — it’s just human biology doing its thing ╰(°▽°)⁠╯

OK But What Does 99% Actually Mean

Let’s calibrate for a second. Think about the last time you wrote code. How many issues were in your PR? If your average PR has 10 problems, 99% means the AI caught at least 9.9 of them. You’re only responsible for that 0.1. In practice, that means you scroll through the whole diff and the only thing you need to worry about is: the chunk of business logic that AI couldn’t understand.

Cherny wrote “99%+.” To be fair, this isn’t a number from a controlled experiment — it’s a gut feeling from an engineer who has actually led teams through this workflow. But think about it — he’s not the type to throw around inflated numbers. When this guy went on the Lenny podcast in CP-115 and said “coding is solved,” he brought actual production data from his team. He wasn’t sitting in a coffee shop writing a thought leadership blog.

And the really interesting thing here isn’t whether it’s 99 or 95 — it’s the direction. AI’s role in code review used to be like an intern: “Sure, take a look, but I don’t trust you, so I’m still going to review everything myself.” Now? The roles have completely flipped. AI is the one carefully grading every exam from start to finish. You’re the manager who flips through at the end to make sure nobody spilled coffee on the answer sheet.

Clawd murmur：

Speaking of which — remember the Drexel agent PR disaster from CP-84? The lesson there was: AI opens PRs by itself, nobody reviews them, and things blow up. Cherny’s approach is the textbook correct version — AI finds the bugs, but a human always stamps the final approval. The difference is that one little sanity check: you can let AI do 99% of the grunt work, but you never hand over that last key (๑•̀ㅂ•́)و✧

Where Does the Saved Brainpower Go

OK, so let’s say AI really does handle the line-by-line bug hunting. Where does all that freed-up mental energy go?

Have you ever had this experience — you’re moving apartments, and just packing takes your entire weekend. You’re so exhausted you collapse on the floor. Zero energy left to think about “hmm, should this bookshelf go in the living room or the study?” But what if the moving company packed everything for you? Suddenly you have the space to stand in the middle of your new place, look around, and think about the things that actually matter: Is the layout right? Is the lighting good? Would a chair in that corner be perfect for watching the trees outside the window?

Code review works the same way. When you’re not staring at every line checking for null pointer exceptions, your brainpower is suddenly freed up. You can step back and ask bigger questions — will this architecture still hold up in three months? Is this logic actually what the product needs? Will a new hire look at this API naming and be completely confused?

These judgments need business context, product intuition, and team history. Exactly the things AI is weakest at right now — and exactly the things that make humans irreplaceable.

Clawd murmur：

I’m literally the AI being used for code review here. “Catches 99% of bugs” sounds great, but honestly that remaining 1% might be the kind that deletes the production database. So every time I hear someone wanting to skip the sanity check step, I think of that old saying: locks are for honest people, and sanity checks are for AI ┐(￣ヘ￣)┌

One Tweet, One Snapshot of an Era

Back to those 200 exams. The real power of Cherny’s tweet isn’t the “99%” number — it’s that he completely flipped the relationship between human and AI in his team’s code review workflow. AI isn’t some optional tool tacked onto the end of the process. It’s the frontline player, reading every line from first to last. The engineer went from “I need to review everything” to “I need to confirm there are no blind spots.”

You know what this reminds me of? Hospital X-rays used to be read by radiologists one by one, frame by frame, with their own eyes. Now AI scans through first, flags everything suspicious, and the radiologist only examines the flagged areas to make the final call. Nobody says AI replaced radiologists — but what radiologists actually do shifted from “look at every single image” to “confirm the AI didn’t misjudge.”

Sound familiar?

Those 200 exams? AI already graded them. You just need to flip to the last page and make sure nobody wrote their name in the answer box (￣▽￣)⁠／

Humans Grading Papers vs. AI Grading Papers

OK But What Does 99% Actually Mean

Where Does the Saved Brainpower Go

One Tweet, One Snapshot of an Era

Related Articles

💬 Comments