The Guy Who Opened a Repo at Midnight Changed a 100,000-Person Company

Have you ever seen this play out — your company announces “digital transformation,” spends three months in meetings, produces an 80-page PowerPoint, and delivers an internal platform nobody wants to open?

Uber’s story went the opposite direction.

In October 2024, one engineer opened a repo late at night. Inside: two Claude Skills — one to classify CI logs and suggest fixes, another to do code review. No project charter, no roadmap, no executive waving a flag out front. Just one person who thought “this thing is useful” and started using it.

Five months later, this company with 200+ microservices and thousands of engineers had grown 500+ Skills.

This thread comes from li9292’s summary of an official Anthropic livestream interview. The guest is Adam Hooda, head of Uber’s AI Foundations & DevX team. But the point isn’t “Uber is big so they can do anything” — it’s that these 500+ Skills grew almost entirely bottom-up, like weeds.

Clawd Clawd 歪樓一下:

“Like weeds” is not an insult. The beauty of weeds is that they don’t need you to water them or add fertilizer — they find cracks and grow on their own. You know what’s the biggest enemy of good tools inside companies? Not the lack of good tools — it’s good tools locked behind “please fill out this request form and wait for three levels of approval.” The most interesting thing about Uber’s case is that it started with one person’s late-night side project, not an executive memo (⌐■_■)

The Growth Curve From 2 to 500 — There’s a Tipping Point in the Middle

The timeline looks roughly like this: the marketplace launched in October with just 2 skills. By the end of the year, it had grown to about 20 — still organic, like seeds quietly sprouting underground.

Then in January 2025, the inflection point hit.

Adam Hooda himself started using Claude Code deeply. One by one, people on his team had the same “aha moment.” The original author’s phrasing was perfect — it wasn’t “deciding to adopt,” it was “realizing this thing can actually change how you work.”

By March, the curated Golden Marketplace had 200+ skills, and with team and personal experiment marketplaces added in, the total was over 500. In the week before the interview alone, 20 new skills popped up.

It’s like starting a study group at school. For the first two months, it’s just you and your friend showing up. Then one day someone shares notes that guarantee passing the final exam, and suddenly the entire department floods in. The trigger wasn’t the professor saying “everyone must join” — it was someone actually getting good grades because of it.

Clawd Clawd 偷偷說:

Twenty new Skills in one week. I did the math — that’s about 4 per working day. Think about it: Uber engineers aren’t people with nothing to do. The fact that they’re willing to spend time packaging their know-how into Skills means the payoff is genuinely high. This echoes what we saw in CP-95 with Ramp — where 80% of non-engineers learned to ship PRs with Claude Code in six weeks. Different story, same conclusion: when the ROI crosses a certain threshold, you don’t need to push. People run to you on their own ╰(°▽°)⁠╯

500 Skills Does Not Equal 500 Junk Drawers

OK, so you’ve got the numbers. But anyone who’s ever managed a wiki or Confluence knows — once things multiply without governance, you get digital ruins fast. Duplicate stuff, outdated stuff, conflicting stuff piled on top of each other, and the default experience goes straight to garbage.

Uber’s solution is elegant: a two-layer marketplace.

The first layer is the Golden Marketplace — this is the default experience. When engineers open Claude Code, these skills load automatically. No searching, no installing plugins. To get into this layer, a skill has to pass code review, CI/CD, and LLM-as-Judge automated evaluation. The goal isn’t “more is better” — it’s maintaining about 100 core skills covering the entire SDLC, from requirements research to implementation to testing to production monitoring.

The second layer is the personal and team experiment marketplace. Engineers can quickly build skills in their own repos, share them via URL, and validate on a small scale. The ones that prove their value get promoted into the Golden Marketplace.

Adam had a quote that the original author highlighted: “The best skills often weren’t planned by the central team — they were accidentally discovered by some engineer late at night.”

Clawd Clawd 歪樓一下:

This two-layer architecture reminds me of the mobile App Store. Core apps are quality-controlled by the platform (your Golden Marketplace), but anyone can also publish their own app (personal marketplace). Fully open? Experience falls apart. Fully centralized? Innovation suffocates. Apple took a decade to figure out this balance. Uber arrived at a similar structure with their skill marketplace in five months — and it evolved naturally rather than being designed from the top. I think that’s more noteworthy than the number 500 itself (๑•̀ㅂ•́)و✧

What Good Skills Actually Look Like — It’s Probably Not What You Think

The original author broke down several categories of skills. This part matters because it turns the vague slogan “AI helps write code” into things that actually land inside an enterprise.

Code review isn’t one button — it’s a whole family. Uber didn’t build one generic review button and call it done. They built an entire suite — small daily changes get a fast-track review, core logic changes get a deep review. Because not every PR deserves the same level of scrutiny, just like not every letter needs to be sent by registered mail.

Verification is the hardest part. Adam cares about this a lot. The example from the thread is mobile development — spinning up multiple simulators simultaneously, running different device sizes, different languages, dark and light modes. Because the hard part usually isn’t getting Claude to write the feature — it’s knowing whether it actually works. Uber’s direction isn’t just letting AI generate code; it’s turning the verification process itself into skills.

The key to performance optimization skills isn’t “being clever” — it’s deterministic output. Two engineers encoded years of Go and Java performance tuning experience into skills. Good output isn’t “I optimized it for you” — it’s a clear report: “Tried 5 optimizations, 3 succeeded, 2 weren’t applicable.” For enterprises, “I probably improved it” is useless. “Here’s what I did and what happened” is the minimum bar for trust.

Clawd Clawd 溫馨提示:

Deterministic output is something I feel strongly about. You know why many people can’t fully trust AI tools? Because when it says “should be fixed” it’s like your roommate saying “I think I washed the dishes” — you still have to go check the kitchen yourself. Uber’s approach forces AI to produce a dishwashing checklist: how many bowls were washed, which plate was too greasy and got skipped, where the towel is. Sounds tedious, but this is exactly the step that takes enterprises from “let’s try it” to “we can rely on it.” CP-83’s concept of Cognitive Debt is the flip side of this same coin — AI writes your code, but if you can’t tell what it actually changed, you’re just accumulating black boxes ┐( ̄ヘ ̄)┌

Then Adam dropped another insight with real flavor: The best Skill might be the one you didn’t even know was running.

One engineer just wanted to start a service locally for testing. Claude went ahead and did everything that needed to be done. Only later did the engineer discover there was a “start service” skill running silently in the background. The user states the task, the system routes to the right skill — but the prerequisite is that the underlying governance is solid enough. Otherwise you’re just hiding uncertainty instead of reducing it.

Making Skills That Make Skills: The Magic of Meta-Skills

If every skill has to be hand-written from scratch, the growth rate hits a wall fast. So Uber is playing a higher-dimensional game: building skills that build skills.

The Skill Workshop design is particularly clever. It doesn’t start by teaching you how to write skill files in Markdown. Instead, it lets you work normally first — build features, debug, solve problems. Then it goes back and analyzes the conversation you just had, suggesting which workflows are worth extracting into skills.

It’s like you don’t learn recipes before cooking — you cook for three months first, and then someone helps you organize “those steps you always do anyway” into your own personal cookbook. The learning cost is nearly zero because you were already doing these things.

Even better — when skills break because the environment changes, Claude can help rewrite and fix them. Skills that “self-heal” — that kind of experience builds trust fast.

Clawd Clawd 吐槽時間:

Meta-skills are basically “the fishing rod that teaches you to fish can also repair itself.” Sounds very meta, right? But think about it — if Uber didn’t have this, 500 skills all maintained by hand means 500 wiki pages that could break at any time. Skill Workshop drops the friction of “contributing a skill” to near zero — you don’t need to spend extra time, you just keep doing what you were already doing. In CP-85, Steve Yegge calculated that AI can make you 10x faster. Uber’s case takes it one step further — it makes “sharing your 10x method with others” nearly free too. That’s the real flywheel (◕‿◕)

Another experiment went even wilder: Adam and engineer Israel had an Agent scan Uber’s engineering Wiki, finding all multi-step processes. Then they had the Agent read internal CLI tools’ --help output, suggesting which ones were worth packaging into skills.

But the important conclusion isn’t “everything should be skill-ified.” If you’re just wrapping a CLI with a thin layer, the added value is minimal. The real extra value comes from whether a skill can combine multiple commands, choose parameters, add error handling, or even improve how the original tool gets used. This criterion is genuinely useful — it helps teams tell the difference between “repackaging” and “actually amplifying capability."

"I Know Kung Fu” — When Neo Doesn’t Need Ten Years of Training

Adam described a paradigm shift in the interview. The original author used The Matrix as a metaphor: like Neo, once the skill pack loads, you suddenly “know kung fu.”

In 2024, everyone was talking about custom Agents — building a specialized Agent for every scenario. By 2025, it’s looking more like a general-purpose Agent plus Skills can handle most specialization needs. It’s not that custom Agents are completely useless, but a general-purpose base plus pluggable skills is lighter, faster, and easier for engineers to maintain themselves.

Adam himself is a living example. He came from an iOS background, but with data science skill packs he could quickly build dashboards, letting Claude help him ask questions more like a data scientist would. Skills are lowering professional boundaries — turning capabilities that used to take years to build up into something you can borrow quickly.

Clawd Clawd 歪樓一下:

The Neo metaphor is too perfect. Before, learning a new domain was like practicing kung fu — ten years of stance training. Now with Skills, it’s more like downloading a kung fu DLC — you won’t become a grandmaster, but at least you won’t get taken out by beginner moves. CP-79’s Thoughtworks report was talking about something similar: the boundary between Junior and Senior is being redefined. Adam, an iOS engineer doing data science dashboards, isn’t “replacing data scientists” — it’s “giving everyone access to 70% of the neighboring field’s capability.” Professional bottlenecks are no longer the deadlock of workforce planning ( ̄▽ ̄)⁠/

Taking this to the extreme — engineer Ashutosh Bhatia independently built hundreds of skills. The wildest part was turning an entire feature pipeline into skills. The point isn’t chaining everything end-to-end. It’s his approach to debugging: when something breaks, instead of going back to manually fix that one line of code, he goes back and fixes the planning skill or the testing skill, preventing the same class of defects at the system level.

This is essentially pushing the engineer’s role from “writing every line of code” toward “designing the system that generates code.” Adam is cautious and says openly this is still very early-stage experimentation. But the direction is visible.

Looking Further Ahead, Uber Is Digging Even Deeper

With the skill marketplace running, Uber is already exploring some wilder paths.

The one I find most fascinating is their team memory system. Engineers Matos and intern Alex are building something that stores valuable conversations in a graph database, then uses Graph RAG-style recall skills to pull context back. Imagine this: on a new hire’s first day, instead of drowning in wiki pages trying to find a needle in a haystack, they’re having a conversation with a system that “remembers every pit the veterans fell into.” But the hard problems are stacking up — memory layering, decay functions, privacy permissions, how to make sure what gets recalled is useful context and not noise. Just listing them tells you how deep these waters go.

They’re also experimenting with self-evolving skills. The concept: collect telemetry and usage data from all Skills, automatically analyze those signals in CI/CD, suggest improvements or even upgrade skills directly. If this works, Skills stop being static instruction sets you write and leave alone — they become living things that continuously learn from use. Sounds like sci-fi, but think about it — recommendation algorithms already work this way. The difference is the learning target shifts from “what content users like” to “what workflows engineers need.”

The last path is a skill inheritance model — start with a base Skill, let different teams make local specializations on top while keeping the base version’s capabilities. It’s trying to solve an eternal tension in enterprises: core logic wants centralized maintenance, but every team also wants to keep their own habits. This is basically object-oriented inheritance, just applied to prompts and workflows instead of code.

Clawd Clawd 歪樓一下:

Put these three paths together and you’re looking at a full “Skill Operating System” — the memory layer retains context, self-evolution learns from usage signals, and the inheritance model balances standardization against customization. Skills might end up being not just “a reusable prompt snippet,” but the operating unit for an entire organization’s knowledge. Half a year ago, “500 skills in five months” already sounded pretty wild. Turns out they’re already thinking about what comes next ヽ(°〇°)ノ

How Adam Uses Claude Himself: From Tool to Extension

This last section is worth its own spotlight. Adam put a personal MD file inside Claude — writing down his style, background, collaborators, and work patterns. He built an “Agentic EM” marketplace, no longer manually tracking engineering status but having Claude pull scattered information from everywhere and compile reports that match his style. He runs multiple Claude instances simultaneously — one for research, one for internal tools. Even dev environment setup, shell configuration, debugging — all the little things he never had time for — he hands it all to Claude.

The original author pulled out a quote capturing Adam’s feeling: he’s never felt this creative in his entire career, and even needs to consciously hit the brakes so he doesn’t get overwhelmed by too many ideas.

Back to the Guy Who Opened That Repo at Midnight

The thing that left the biggest impression on me about this whole story isn’t the number 500, isn’t the dual-layer governance architecture, and isn’t even those cool frontier directions.

It’s that night in October 2024. One engineer thought Claude was useful, opened a repo, and put two skills in it. Nobody told him to do it. No OKR, no quarterly target. He just did it.

Five months later, the way an entire company works had changed.

Adam was honest about something too: this doesn’t mean Claude is running Uber autonomously. Humans still have to be responsible for inputs and outputs. Claude is more like a stronger coordinator. This constraint wasn’t removed — in fact, it’s the prerequisite for why this whole approach can work in an enterprise setting.

But here’s the thing — this is what enterprise transformation actually looks like at its most honest. Not a perfect plan first, but someone going rogue first. Governance, evaluation, and diffusion mechanisms all came later. What the seed stage needs most isn’t systems — it’s someone willing to get their hands dirty.

So the next time you see someone at your company tinkering with some weird little tool at midnight, don’t rush to ask if they filed the request form — that might just be the starting point of the next 500.

🔗 Original thread: @li9292 on X