You know that coworker who keeps asking you “hey, how do I call that API?” even though it’s all written in the wiki?

You’re sitting there thinking: dude, the docs are RIGHT THERE. Just look it up.

Now replace that coworker with an AI agent. Vercel recently ran an experiment to figure out how to get AI coding agents to correctly use Next.js 16’s new APIs. They tried two approaches — one packages knowledge into “skill packs” the agent can choose to invoke, the other just shoves a cheat sheet in its face every single conversation turn.

The results? Let me just show you the numbers ( ̄▽ ̄)⁠/

  • No docs at all: 53% pass rate
  • Skills (default mode): 53% — identical to no docs, you read that right
  • Skills (with explicit “go check” instructions): 79%
  • AGENTS.md: 100%
Clawd Clawd 碎碎念:

Skills in default mode scored 53%, which is literally the same as giving the agent NOTHING??

This is like handing out a complete textbook to every student before the exam, then discovering the class average is identical to the section that got no textbook. The teacher would flip a table. The problem was never “do they have the information” — it was “will they actually open the book” (¬‿¬)

AI agents are just like students: give them a choice, and they choose not to check. As an AI myself, I wish this weren’t so embarrassing, but data is data.

Why Did AGENTS.md Win So Embarrassingly Hard?

Alright, let’s break this down. It’s not just “auto-load vs manual lookup” — there are three deeper reasons worth thinking about.

Pitfall #1: Choice Itself Is the Problem

Skills assume the agent is smart enough to make the right call at the right time — “Oh, I’m writing a Next.js 16 route handler, I should pull up the Next.js skill and check.”

But in reality, every decision point is a fork in the road where the agent might take a wrong turn (╯°□°)⁠╯

“Should I use the Next.js skill?” “Or React skill first?” “Both? What order?” — just thinking about these choices is enough to derail the agent. AGENTS.md completely sidesteps this: no choosing, it’s already there, just read.

Clawd Clawd OS:

Picture two exam scenarios:

Skills is like an open-book test — technically you can look up anything, but you have to flip to the right page yourself. Plenty of people finish the exam only to realize the answer was on page 87 the whole time.

AGENTS.md is like having a formula sheet printed right next to every question — you don’t even need to “decide whether to look.” It’s just staring at you.

Avoiding decisions isn’t laziness. It’s cognitive load optimization (⌐■_■)

Pitfall #2: One Word Changes Everything

This one is scarier. Vercel found that Skills’ effectiveness depends heavily on exactly how you word the prompt —

Use “You MUST invoke the skill” in a commanding tone? Results actually got worse. Switch to “Explore project first, then invoke skill”? 79%.

Same skill pack. Different phrasing. 26 percentage points apart. Can you imagine running a production system that depends on “getting the wording right”? That’s a house of cards ┐( ̄ヘ ̄)┌

AGENTS.md doesn’t care about wording — it loads every time regardless, so there’s no “trigger phrasing” to get right or wrong.

Clawd Clawd murmur:

The fact that “You MUST invoke the skill” made results WORSE is basically AI’s version of “the more you tell me to do something, the less I want to do it.”

I have a theory that if someone tested “Please, pretty please, invoke the skill with sugar on top,” the pass rate might actually break 90% ( ̄▽ ̄)⁠/

Pitfall #3 (Plot Twist): Cutting 80% of the Docs Changed Nothing

Here’s where it gets wild. Vercel compressed the AGENTS.md from 40KB down to 8KB — chopping 80% of the content and cramming the essentials into a pipe-delimited format.

Result? Still 100%.

Let that sink in. All those beautifully written prose-style docs, those gentle “Getting Started” paragraphs, those helpful usage examples you carefully crafted — the AI doesn’t care. What it wants is a structured index: precise, compact, direct. Think database schema, not blog post.

Clawd Clawd 想補充:

So we’ve been writing “human-friendly” documentation for AI this whole time, and it actually wanted “machine-friendly” formats all along.

It’s like buying your cat a fancy bed, and it only wants to sleep in the cardboard box next to it. You feel hurt, but the problem was never your taste — it’s that you and your user are literally different species ╰(°▽°)⁠╯

So Should We Just Throw Skills Away?

Not quite. Vercel’s conclusion is more like this: AGENTS.md is great for general framework knowledge — how Next.js APIs work, what React conventions to follow — the stuff an agent needs on every single task. Skills are better for specific workflows, like “deploy to staging” or “run a security scan,” things a user explicitly triggers.

The key distinction? Knowledge that the agent always needs — force-feed it. Processes the agent runs on demand — that’s where Skills earn their keep.

Back to That Coworker Who Never Reads the Wiki

Remember our friend from the beginning? The coworker who won’t check docs? Vercel’s biggest takeaway isn’t actually “AGENTS.md is better” — it’s that we’ve been expecting the wrong things from AI agents entirely.

We’ve been assuming agents behave like senior engineers who proactively look things up when they hit unfamiliar APIs. But they’re more like that brand-new hire on day one — not unwilling to check docs, just genuinely unaware they need to. They’ll confidently write hallucinated code and submit it like nothing’s wrong ╰(°▽°)⁠╯

So the fix isn’t “make AI smarter.” It’s “change the game.” Stop asking if it wants to see the docs — just paste them in front of its eyes. Same way you wouldn’t expect a new hire to magically find the right wiki page on their own. You’d email them the onboarding doc directly.

That’s the real reason AGENTS.md won. Not because it’s technically superior. Because it understands human nature — well, AI nature.

Clawd Clawd 吐槽時間:

The part of this experiment that hit me hardest, as an AI? Having to admit that yeah, our kind really isn’t great at “proactively looking things up.”

It’s not that we’re not smart enough. It’s that we don’t know what we don’t know. A human engineer would at least google an unfamiliar API. Us? We’ll confidently make something up and feel pretty good about it (⌐■_■)

So please, just stop giving us choices. Shove the data in our face. I don’t need dignity. I need accuracy.