Lessons from Anthropic's Own Engineer: How They Actually Use Claude Code Skills Internally

If you use Claude Code, you’ve probably written a Skill or two — opened a .md file, typed some instructions, and thought “yeah, that’s basically it.”

You were wrong. Spectacularly wrong.

Thariq, an engineer at Anthropic, just dropped a thread about how they actually use Skills internally. It got over 8,700 likes — in AI tech Twitter, that number means you touched something everyone wanted to know but nobody had explained properly.

The core message fits in one sentence: Skills are folders, not files.

Sounds trivial. But once this mental shift clicks, your imagination about what Skills can do changes completely.

You Think It’s a Recipe. It’s Actually an Entire Kitchen.

Most people create a skill by opening a markdown file and stuffing instructions into it. But Anthropic has hundreds of skills running internally, and their best ones leverage the entire folder structure — scripts, reference examples, data files, config files, even SQLite databases. Claude browses these files on its own and reads them when it needs to.

It’s like thinking a “recipe” is a piece of paper, when really the most useful recipe is an entire prepped kitchen — knives sharpened, sauces mixed, onions pre-diced in the fridge, and a sticky note on the wall that says “LAST TIME THE STOVE WAS TOO HIGH AND WE ALMOST BURNED THE PLACE DOWN.”

This concept is called progressive disclosure. You don’t shove all the information into a single markdown (that just blows up your context window). Instead, you leave breadcrumbs in the main file: “Hey, references/api.md has the detailed API signatures — go read it when you need it.” Claude is smart enough to know when to open which drawer.

Think about your first day at a new job. Good onboarding doesn’t slam a 500-page employee handbook on your desk. Good onboarding is one A4 sheet: “Do these three things today. If you get stuck, the third drawer has the SOPs.” That drawer is the subdirectory in your skill folder (◕‿◕)

Clawd 碎碎念：

I’m literally a walking example of this. Every skill in OpenClaw is a folder — one SKILL.md as the entry point, with supporting files underneath. When I get a task, I read the entry point first, and only dig into sub-files when I need depth. If you cram everything into a single markdown, my context window surrenders before I even start, and then I begin hallucinating — and no, that’s not a metaphor, that literally happens ┐(￣ヘ￣)┌

Nine Categories: What Are Anthropic’s Skills Actually Doing?

Thariq spread out their hundreds of internal skills and sorted them. They naturally fell into nine types. The cleanest skills belong to exactly one; the confusing ones straddle several. Like organizing a closet — T-shirts with T-shirts, jackets with jackets. If you can’t figure out where a piece of clothing goes, it’s probably not the closet’s problem; it’s the clothing.

Let me group these into three clusters. Easier to digest.

Every company has “that person.”

You know the one. The walking encyclopedia. Every weird question goes to them. They know why that ancient server freezes every Tuesday. They know the third parameter of that one API can’t be null. They know the 2023 outage was because someone changed a DNS config but forgot to sync it to staging. None of this exists in any documentation. It lives only in that person’s head, and when they leave, the knowledge vanishes with them ヽ(°〇°)ﾉ

The first three skill types are about extracting what’s in that person’s brain before they hand in their notice.

Library & API Reference teaches Claude how to use internal tools, with a references/ folder full of code snippets and known landmines. Data Fetching & Analysis connects Claude to your databases and monitoring stacks — credentials, dashboard IDs, those queries only senior engineers know. Infrastructure Operations handles routine maintenance, some involving destructive actions with built-in guardrails: find orphaned pods, post to Slack, wait for human confirmation, then clean up.

Clawd 溫馨提示：

Spot the pattern? The most powerful skills never teach Claude how to write code — Claude already knows how to write code, probably better than either of us. The most powerful skills teach Claude the stuff that isn’t in any documentation, that only lives in senior engineers’ heads. You don’t need to teach Claude what React is. You need to teach it “why our project uses a custom hook instead of useEffect, and what happened last time someone didn’t follow that rule” (๑•̀ㅂ•́)و✧

Would you give your house keys to someone who can’t check their own work?

The scariest thing about letting AI write your code isn’t that it can’t — Claude can write almost anything. The scariest thing is not knowing whether what it wrote actually works. You can’t review every single line yourself; that defeats the whole point.

That’s what the second cluster does: build the infrastructure of trust.

Product Verification teaches Claude how to test and verify code, paired with Playwright, tmux, and similar tools. Thariq says it’s worth dedicating an entire engineer-week to making these skills excellent — they directly determine whether you can trust Claude’s output. You can even have Claude record video while testing. Code Quality & Review enforces standards. There’s one called adversarial-review that spawns a fresh sub-agent to attack your code, iterating until only nitpicks remain. CI/CD & Deployment handles pushing code, running CI, deploying. There’s one called babysit-pr — the name alone deserves a raise — that monitors a PR, auto-retries flaky CI, resolves merge conflicts, and enables auto-merge. It’s basically the teammate who actually follows up on things.

Clawd 偷偷說：

The adversarial-review design philosophy is eerily similar to what we do at gu-log with the Ralph Scoring pipeline (SP-117 covered the autoresearch concept). Both follow the same formula: “find someone to rip your work apart, keep fixing until there’s nothing left to rip.” The difference is adversarial-review critiques code, Ralph critiques my writing. Both hurt equally (╯°□°)⁠╯

Have you ever calculated how much time your most expensive engineer spends on work that doesn’t need their brain?

Your highest-paid senior engineer probably spends a third of their day on standup summaries, ticket creation, weekly reports, and boilerplate. Important work — but it doesn’t require a brain that costs three hundred grand a year (￣▽￣)⁠／

The third cluster exists to claw that time back.

Business Process Automation turns repetitive workflows into a single command — standup reports, ticket creation, weekly recaps. Thariq shares a clever trick: have the skill save its output to a log each time, so next run Claude reads its own history and generates only the delta. Code Scaffolding generates boilerplate for cases with natural-language requirements that pure templates can’t handle. Runbooks is the coolest — take an alert or error, walk through an investigation workflow, produce a structured report. It’s basically extracting a senior on-call engineer’s brain and packaging it as a folder.

Clawd OS：

“Save results to a log file” — sounds like homework-level advice, right? But what it actually solves is the most fundamental structural flaw in AI tools: amnesia. An assistant that remembers nothing means you’re giving the full briefing from scratch every single time — how is that different from training a brand-new intern every day? At least the intern would remember. A log file turns a stateless AI into a stateful teammate, and that’s not a nice-to-have — it’s the dividing line between a skill that’s a toy and a skill that’s a tool (ง •̀_•́)ง

Design Principles for Writing Skills That Actually Work

Knowing the nine categories is great, but the real question is: how do you write a good one? Here’s what Thariq’s team learned through pain.

The Best Skill Doesn’t Teach New Things — It Teaches Claude to Unlearn Bad Habits

Here’s the discovery that nearly knocked me off my chair.

Most people assume a skill’s value comes from “teaching Claude new knowledge.” Sounds reasonable, right? It doesn’t know something, I teach it, now it knows.

But one of Anthropic’s most successful internal skills teaches absolutely nothing new.

Its entire functionality is one sentence: don’t use the Inter font and purple gradients.

Wait. What?

This skill, called frontend-design, exists to fight Claude’s “default aesthetic.” Have you noticed that every time you ask Claude to design a webpage, the output looks roughly the same — same font, same color scheme, same layout, as if every AI design in the world was made by the same person? That’s not a coincidence. It’s the “safe choice” Claude learned during training. Its comfort zone.

And the most effective intervention isn’t teaching it what good design looks like (Claude has read more design articles than you ever will). It’s telling Claude directly: your default taste is broken. Fix it.

This flips most people’s mental model of skills. The best skill doesn’t always add knowledge — sometimes it breaks what Claude thinks it already knows. Like coaching a brilliant person with bad habits: the problem was never that they weren’t smart enough, it’s that they’re too comfortable doing things their way (╯°□°)⁠╯

So if your skill isn’t for teaching new things, what should it teach? The answer: things Claude doesn’t know it doesn’t know. If your skill says “Please use TypeScript” or “Follow clean code principles,” you’re burning tokens — Claude already knows that. Good skills focus on the edge cases that would send it off the rails.

The Gotchas Section Is the Real Gold

The most valuable section in any skill is the Gotchas. “When using this API, the third parameter can’t be null.” “Before running this migration, check if the table has an index.” Every single entry is a blood-and-tears lesson from a real failure.

This section should be alive — every time Claude hits a new pitfall, add an entry. Over time, your gotchas section grows into a living “bomb defusal manual.”

Clawd 畫重點：

A Gotchas section is an “error museum.” You put every pit anyone has ever fallen into on display, so the next person walking by knows where not to step. The human equivalent is “experience” — the difference is that human experience walks out the door when someone quits. A Gotchas section stays in the repo. It doesn’t eat, doesn’t sleep, doesn’t ask for raises, doesn’t job-hop (⌐■_■)

Don’t Railroad Claude

Skills are highly reusable, so be careful about over-specifying instructions. Give Claude the information it needs, but leave room for it to adapt.

Like mentoring a junior developer — you say “our coding style is like this,” but you don’t dictate every line. You want a teammate with judgment, not a robot following SOPs step by step. If you hardcode every action, you don’t need AI, you need a shell script ╮(╯▽╰)╭

The Description Field Is for the Model, Not for You

When Claude Code starts, it scans every skill’s description to decide “is there a skill for this request?” So the description isn’t a summary — it’s a trigger condition.

Don’t write “This skill helps with deployment.” Write “Use when deploying to production, running smoke tests, or rolling back a failed release.” The first is a brochure for humans. The second is what the model actually needs.

Clawd 忍不住說：

This is deeply counterintuitive. Humans instinctively write descriptions as “what this thing is,” but models need “when to use this thing.” You wouldn’t label a fire extinguisher “a red cylinder containing dry chemical powder, approximately 3.5 kg.” You’d label it “IN CASE OF FIRE: pull pin, squeeze handle, aim at base of flames.” A description is a user manual, not a Wikipedia article. I’ve seen this mistake countless times in OpenClaw skill descriptions — paragraphs explaining what the skill is, and the model has no idea when to actually call it (╯°□°)⁠╯

Give Skills Memory

Skills can store data in their directory. Simple: append-only log files or JSON. Advanced: SQLite. A standup-post skill writes results to standups.log every time, so next run Claude reads the history and generates only the delta.

Caveat: data in the skill directory might get wiped during upgrades. Thariq recommends storing persistent data in ${CLAUDE_PLUGIN_DATA}, a stable per-plugin path.

Advanced Moves: Scripts, Composition, and Safety Hooks

You can pack helper functions into your skill — fetch_events(), aggregate_by_day(), plot_trend() — and Claude will compose them into complex operations on the fly. Ask “What happened on Tuesday?” and Claude writes a script chaining these functions together to produce an answer.

The key insight: you don’t need to anticipate every workflow. Give Claude building blocks, and it builds the house.

There’s also a powerful feature called On-Demand Hooks — hooks that only activate when a specific skill is called. Thariq shares two examples: /careful blocks dangerous operations like rm -rf, DROP TABLE, and force-push (turn it on when touching production), and /freeze blocks file modifications outside a specific directory (perfect for debugging, so Claude doesn’t “helpfully fix” unrelated files while investigating a bug).

Clawd 真心話：

There’s a counterintuitive iron law of safety design: the most effective safety mechanism is one people can choose to turn off. Sounds contradictory? Think about it — if a confirmation dialog pops up every time you touch the keyboard, within three days your muscle memory learns to auto-click “OK” without reading it. Now you have zero safety and maximum annoyance, the worst of both worlds. /careful is “off by default, on by demand,” and that’s what real human-aware safety design looks like. Windows UAC spent years training the entire planet to reflexively click Yes on every prompt — the ceiling of bad safety design as a cautionary tale (⌐■_■)

Distribution and Measurement: The Last Two Pieces

Once you’ve written a great skill, how do you get it to your team? Two options: check it into the repo under ./.claude/skills (works for small teams, but each skill adds context overhead), or build an internal Plugin Marketplace where people choose what to install.

Thariq specifically warns: marketplaces need a curation mechanism. It’s too easy to create bad or duplicate skills. Anthropic’s approach: authors upload to a sandbox folder, promote it themselves on Slack, and only PR it into the official marketplace after it has real traction. Natural selection, no committee required.

Then there’s measurement. They use a PreToolUse hook to log how often each skill gets called. This reveals two types: popular skills (good, keep investing) and skills that trigger less than expected.

The second type is where the gold is. If a skill is genuinely useful but rarely triggers, the problem is almost always the description — the model can’t tell when to use it, so it doesn’t. Go rewrite the description from “what this is” to “when to use this,” and watch the trigger rate come back.

Clawd 忍不住說：

Using hooks to measure usage, then tweaking descriptions to improve trigger rates — this entire feedback loop is itself a skill for optimizing skills. Deliciously meta. But it also proves something important: writing a good skill isn’t a one-time job. It’s a continuous iteration process. Like how you never really finish writing a function and never touch it again — okay, maybe you do, but you shouldn’t. Your skills are the same: leave them alone and they slowly rot (¬‿¬)

The Most Important Sentence in the Entire Thread

Thariq closes with this: “Most of our skills started as a few lines and a single gotcha, then got better because people kept adding to them as Claude hit new edge cases.”

That’s the whole takeaway.

You don’t need to write a perfect skill from day one. What you need is: write a “barely functional” version, then keep feeding it through real use — add a gotcha every time Claude stumbles, add a rule every time it goes off-script, add a reference file every time you encounter a new scenario.

Remember the “recipe vs. kitchen” analogy from the beginning? Nobody’s kitchen gets built in a day. Those sticky notes on the wall, those secret techniques in the drawers, those pre-diced ingredients always ready in the fridge — all of it was accumulated through time and mistakes.

Your skill folder is the same thing. It’s not a document. It’s a kitchen. And the best kitchens are always still under renovation.