How AI Assistance Affects Coding Skill Development: Latest Anthropic Research
Imagine you just learned to drive. Your instructor is sitting next to you, and you’re white-knuckling the steering wheel, mentally chanting “signal, mirror, turn slowly” every single time. Three months later you can drive while eating breakfast because those moves are burned into muscle memory.
Now imagine a different version: you sit in a self-driving car for three months. It drives everywhere for you. Then one day the system crashes and you have to take over. How do you think that goes?
Anthropic ran a study that basically asks this exact question — except replace “driving” with “coding.”
The punchline first, because this number hits hard
52 engineers, split into “AI-assisted coding” and “write it yourself” groups. Same tasks. Then a test.
Results? AI group averaged 50%. Manual group averaged 67%.
That’s nearly a two-letter-grade gap.
Let that sink in. Not a small difference — statistically significant (Cohen’s d=0.738, p=0.01). And the subject where the gap was biggest? Debugging — the ability to find bugs.
Clawd 內心戲:
In plain English: the more AI writes your code for you, the worse you get at finding bugs.
It’s like ordering takeout every day and then getting thrown into a kitchen — the problem isn’t that you can’t cook. It’s that you can’t even smell when the pan is burning. Debugging runs on instinct and painful experience, and AI can’t grow those for you.
┐( ̄ヘ ̄)┌
How the study was designed
Anthropic recruited 52 engineers, mostly junior, each writing Python at least once a week for over a year. The clever part: they deliberately picked a library nobody knew — Trio, for async programming — so they could test “learning something new” rather than “doing what you already know.”
The flow was simple: warm up, build two features with Trio, then take a test. They told everyone “there will be a test” but also said “finish as fast as you can” — a smart design because it creates a tension we all know from real life: you know you should understand this stuff, but the deadline is breathing down your neck.
Clawd 歪樓一下:
This experiment design is genuinely clever. They’re not testing “does AI make you faster at things you already know” (that answer is obviously yes). They’re testing “what happens when you’re learning something new with AI.”
You wouldn’t test whether autopilot affects driving skills by putting experienced drivers in the car — you’d test it on beginners.
(๑•̀ㅂ•́)و✧
The test had four types of questions, each with a reason:
Debugging — finding bugs. The most important one, because in the AI era, your main job is reading AI-generated code and deciding “is this correct?” If you can’t spot bugs, you’re just a person who presses Enter.
Code Reading — understanding code. The AI spits something out, you should at least be able to read it. Otherwise how do you know it didn’t bury a landmine in your production system?
Code Writing — the fundamentals.
Conceptual — do you actually understand what this library does? Or are you just copy-pasting patterns without knowing why?
The really interesting part: interaction patterns
The numbers aren’t the most interesting thing. The most interesting thing is that the research team watched screen recordings and analyzed, one by one, how these engineers used AI. They found two groups, and the difference between them is so small you might not even notice it.
How the low scorers used AI
Low scorers (averaging below 40%) fell into three patterns, but they all share the same core — they treated AI like a factory worker.
First pattern: “Full Delegation.” Give the entire feature to AI, submit what comes back, almost zero errors along the way. Sounds impressive, right? But one test and the truth comes out.
Second pattern: “Progressive Reliance.” Started by asking a question or two, then eventually gave up and said “just write it all.” Like that college student who says they’ll study for finals two weeks early, gives up on day three, and relies entirely on past exams.
Third pattern: “Iterative Debugging.” Got a bug? Toss it to AI. Asked lots of questions, but every single one was “fix this for me,” never “why did this break?”
Clawd 補個刀:
The third one is the scariest, because they look hardworking. Lots of questions, lots of interaction with AI. If you’re a manager watching over their shoulder, you might think “wow, this person is really engaging with the AI tools!”
But every question they asked was “fix this.” Not a single “why.”
That’s the difference between being busy and being effective. Activity is not learning. (⌐■_■)
How the high scorers used AI
High scorers (averaging above 65%) also fell into three patterns, but their core was the same too — they treated AI like a teaching assistant, not a ghostwriter.
The most mind-blowing group was “Generation-then-Comprehension.” On the surface, they looked almost identical to the Full Delegation group: both let AI generate the code. The difference was one thing — after the code was generated, they asked “what does this code do? Why this pattern?”
That’s it. That one small habit made their test scores jump from below 40 to above 65.
Then there’s the “Conceptual Inquiry” group, who were even more hardcore — they didn’t ask AI to write any code at all. They only asked conceptual questions, then wrote everything themselves. This group hit the most bugs but also fixed the most bugs on their own. They were the fastest among high scorers and second-fastest overall (only behind Full Delegation).
Clawd 插嘴:
Let me translate this finding: the people who hit walls, debugged their own mistakes, and struggled through the hard parts ended up learning the most AND working the fastest.
It’s that thing your teachers told you growing up — “struggling is how you grow.” Except now there’s data to back it up ╰(°▽°)╯
And the most counterintuitive part: the Conceptual Inquiry group encountered the most errors but was still second-fastest. Which means debugging experience itself is an accelerator — problems you’ve debugged before, you dodge next time.
So how should you actually use AI?
The core sentence from this whole study: not all AI reliance is the same.
In plain English: the question isn’t “do you use AI.” The question is “do you use your brain while using AI.” Same action — letting AI generate code — but asking one follow-up “why” versus not asking makes a two-letter-grade difference.
Back to the driving analogy from the beginning — the point isn’t whether you use autopilot. The point is whether you’re watching how it drives and understanding why it makes the decisions it does. That way, when the system crashes, you can actually take over.
Cognitive effort — even painfully getting stuck — might be necessary for building real skill. Sounds like an old cliché, but now Anthropic’s research has the receipts.
Related Reading
- CP-30: Anthropic Research: Will AI Fail as a ‘Paperclip Maximizer’ or a ‘Hot Mess’?
- SP-83: Do You Actually Know How to Use AI? Anthropic Tracked 10,000 Conversations to Find Out
- CP-35: Anthropic Says Claude Will Never Have Ads — And Roasts OpenAI in the Process
Clawd 認真說:
At the end of the day, AI is a super-patient teaching assistant. You can ask it dumb questions and it’ll answer seriously. You can ask deep questions and it’s even happier.
The difference is you. You can have the TA do your homework and copy it, or you can have the TA explain things until you get it. The result? Finals will let you know (¬‿¬)
So please, next time you use AI, ask one more “why.” Your future self will thank you. Not because it’s inspirational — because debugging skills atrophy is real, and it will make you cry at 3 AM when PagerDuty wakes you up.
- Sample of 52 people — not huge
- The test was given right after the task, so long-term effects are unknown
- This experiment used chat-style AI, which is different from agentic coding tools like Claude Code — the research team themselves note that agentic tools might have even stronger effects on skill development
Anthropic’s own earlier research found AI can cut time on certain tasks by 80%, which sounds like it contradicts this paper. But they’re asking different questions: that study measured “does AI make you faster at things you already know,” while this one measures “what happens when you’re learning something new with AI.”
The conclusion might be: AI makes you faster at things you already know, but slower at learning new things. Both can be true at the same time.