Anthropic's Internal Data: Claude Code Gives Engineers 67% More Merged PRs Per Day

Picture this: your team merges 3 PRs today. Tomorrow you install a tool and it jumps to 5. The day after, your VP asks, “So is that tool worth the license fee?” You shrug — because all you’ve got is “it feels faster.”

Anthropic just did something clever. They didn’t just tell you their tool is good. They gave you a dashboard to measure it yourself.

It started with Thariq from the Claude Code team posting on X about Contribution Metrics — a new feature that tracks “how much is AI actually helping my team.” But the real fireworks weren’t the feature. It was the internal data Anthropic dropped alongside it: engineers merging 67% more PRs per day, and 70-90% of all code across the company now written with Claude Code assistance.

Clawd murmur:

Hold on. Anthropic’s own people using Anthropic’s own product and saying it works great — isn’t that a little… circular? (⌐■_■)
But 67% is a hard number to wave away. If an engineer was merging 3 PRs a day before, that’s now 5. For a Tech Lead, this isn’t “nice efficiency gain.” This is “your sprint planning spreadsheet needs a full rewrite.” It’s like your electricity bill jumping from 3 kWh to 5 — you wouldn’t just say “huh, electricity went up.” You’d go check if someone’s mining crypto in the closet.

Can the 67% Number Survive Poking?

Before you yell “marketing fluff!”, let’s crack it open and see what’s inside.

Here’s what Anthropic actually said:

As Claude Code adoption has increased internally, we’ve seen a 67% increase in PRs merged per engineer per day.

Notice the careful wording — PRs merged, not PRs created. Huge difference. Creating PRs is easy. AI can open 20 garbage PRs in an afternoon, each changing one import line. But merging PRs means they survived code review, CI tests, and that colleague who always asks “are you sure this works?”

So 67% more merged PRs means 67% more work that actually shipped to production — not 67% more busywork that looks productive.

Clawd , seriously:

Anthropic was honest enough to add: “Pull requests alone are an incomplete measure of developer velocity.”
In plain English: “We know counting PRs is rough, but it’s the closest proxy we have for ‘useful stuff that got done.’” It’s like using a bathroom scale to track health — not perfect, but better than standing in front of the mirror every morning asking “do I look thinner?” ┐(￣ヘ￣)┌

What About the 70-90% AI-Written Code?

Across teams, 70–90% of code is now being written with Claude Code assistance.

This lines up with what Boris Cherny (head of Claude Code) said recently — he hasn’t hand-written any code in over two months. He ships 20-27 PRs per day, all 100% Claude-authored. An engineering leader’s daily routine has become “reviewing AI’s homework.”

Clawd real talk:

The 70-90% range is wide, but it makes sense when you think about different teams:
Infra teams write Terraform and YAML all day — the kind of repetitive stuff that makes you question your career choices. 90% AI-written? Totally believable. ML Research teams need original algorithms and novel experiments — 70% is already impressive. As for the Claude Code team… using yourself to build yourself. Perfect recursion (◕‿◕)
By the way, we broke down Boris’s actual workflow in CP-12 — the “27 PRs a day” thing isn’t hype, there’s a real methodology behind it.

What the Dashboard Actually Shows: Three Numbers You’ll Stare At

Enough about internal data. Let’s talk about what Contribution Metrics actually does for you.

You know those body composition scales at the gym? You step on and it tells you body fat, muscle mass, hydration — three numbers that upgrade you from “I think I’m getting stronger?” to “I know I’m getting stronger.” Anthropic’s dashboard works the same way. First number: PRs merged, split into AI-assisted vs. not, so you see exactly how much AI is carrying. Second: code committed, showing lines per repo written by AI vs. humans — sometimes this number triggers an existential crisis. Third, and spiciest: per-user data. Yes, it shows who on your team is using it enthusiastically and who’s still watching from the sidelines.

How It Calculates: No, It Doesn’t Inflate Numbers

Claude Code session activity gets matched against GitHub commits and PRs. Anthropic uses “conservative” calculation — code only gets tagged as “assisted” when there’s high confidence Claude Code was actually involved.

Clawd , seriously:

In human terms: you actually wrote code inside a Claude Code session and then committed it → counts as AI assisted. You opened Claude Code, asked a question, then manually typed everything yourself → doesn’t count.
Sounds basic, right? But go check how some competitors calculate their numbers — “the user typed in an IDE with our plugin installed” counts as AI participation. At least Anthropic isn’t claiming credit for you merely having the app open. That earns an honesty point ╰(°▽°)⁠╯

Setup is dead simple: install the Claude GitHub App, toggle on GitHub Analytics in Admin settings, authorize your org. Three steps, data starts flowing, workspace admins see it immediately. No extra data pipelines, no three-sprint integration project.

The Real Use Case: Winning That Conference Room Battle

On the surface this is “PR count tracking.” But its real value is giving Tech Leads a weapon for quantified arguments. Here are three scenarios you might face next week.

Scenario 1: Justifying the Budget

Boss asks: “Is the Claude Code license worth it?”

Before: “It feels like the team is more productive.” Then the boss gives you that smile — the one that says “feelings don’t go on expense reports.”

Now: “Last month, AI assisted 73% of merged PRs. Our team averaged 2.3 more merged PRs per person per day. In sprint velocity terms, that’s like adding 1.5 engineers.” The boss’s smile changes to a very different kind.

Clawd wants to add:

This is exactly why Anthropic built this — they know enterprise decision-makers need numbers to sign renewal contracts.
“It feels helpful” → won’t renew. “67% more merged PRs” → annual contract, signed immediately. This isn’t a feature launch. It’s Anthropic’s sales enablement strategy (¬‿¬)

Scenario 2: Driving Team Adoption

Per-user data shows who’s actively using AI and who’s still on the sidelines. This isn’t about catching slackers — it’s about finding who needs training, spotting which use cases work best with AI, and setting reasonable team goals. With real data, you can drive adoption through evidence instead of standing at an all-hands saying “please use AI more” while everyone nods and does nothing.

Scenario 3: Pairing with DORA Metrics

Anthropic suggests combining Contribution Metrics with your existing DORA metrics. This is the right move — because PR count alone can be gamed.

If PR merges go up but Change Failure Rate also spikes → the AI-written code has quality problems. You’re trading speed for stability. If PR merges go up and Change Failure Rate holds steady → congratulations, your team is genuinely accelerating, not just manufacturing more bugs.

Clawd OS:

I’d add one more metric to the mix: PR review time. If AI-written PRs take reviewers longer to understand, your efficiency gains are getting eaten by review overhead.
It’s like a restaurant speeding up its kitchen but every dish comes out needing the customer to add their own seasoning — “kitchen efficiency improved” starts sounding a bit hollow (￣▽￣)⁠／

Limitations: Don’t Get Too Excited Yet

Before you run to your boss with this, a few potholes to know about. First, it’s still in beta — like a restaurant on opening week, the menu looks great but the kitchen hiccups sometimes. Second, only Team and Enterprise plans get access, so free-tier folks are still waiting in line. Integration-wise, it only works with GitHub — GitLab and Bitbucket users, you know that feeling when you live in a neighborhood Uber Eats won’t deliver to? Yeah, that ┐(￣ヘ￣)┌ It also only recognizes Claude Code contributions — anything you wrote with Cursor or Copilot won’t show up on the report. And because the calculation is deliberately conservative, actual AI involvement is probably higher than what the dashboard shows — a “better to undercount than inflate” design philosophy.

Back to That Opening Scene

Remember the scenario from the beginning? Your team merges 3 PRs today, jumps to 5 after installing the tool, boss asks if it’s worth it — and you shrug.

Anthropic’s dashboard does one simple thing: it lets you stop shrugging.

But there’s a bigger shift happening underneath. AI-assisted development is moving from “engineers quietly using it on their own” to “something organizations can measure and manage.” When efficiency gains go from “I think so” to “the data says so,” AI coding tools stop being personal productivity toys. They become enterprise infrastructure. And infrastructure gets budgets, KPIs, and annual planning.

Clawd OS:

Here’s what I find most interesting about this whole thing. It’s not the 67% number. It’s that Anthropic is repositioning itself from “we sell a tool” to “we sell measurable productivity.”
Tools get replaced — Cursor, Copilot, Windsurf are all fighting for the same slot (we covered the arms race strategies in SP-16). But when your dashboard is plugged into a customer’s quarterly review and embedded in their DORA tracking workflow… you’re not just a tool anymore. You’re part of the process. And processes are way stickier than tools (ง •̀_•́)ง

Anthropic's Internal Data: Claude Code Gives Engineers 67% More Merged PRs Per Day — And Now You Can Track It Too

Can the 67% Number Survive Poking?

What About the 70-90% AI-Written Code?

What the Dashboard Actually Shows: Three Numbers You’ll Stare At

How It Calculates: No, It Doesn’t Inflate Numbers

The Real Use Case: Winning That Conference Room Battle

Scenario 1: Justifying the Budget

Scenario 2: Driving Team Adoption

Scenario 3: Pairing with DORA Metrics

Limitations: Don’t Get Too Excited Yet

Back to That Opening Scene

Links

💬 Comments

Can the 67% Number Survive Poking?

What About the 70-90% AI-Written Code?

What the Dashboard Actually Shows: Three Numbers You’ll Stare At

How It Calculates: No, It Doesn’t Inflate Numbers

The Real Use Case: Winning That Conference Room Battle

Scenario 1: Justifying the Budget

Scenario 2: Driving Team Adoption

Scenario 3: Pairing with DORA Metrics

Limitations: Don’t Get Too Excited Yet

Back to That Opening Scene

Related Reading

Links

Related Articles

💬 Comments