You know that coworker who makes instant noodles by just adding hot water? No egg, no vegetables, no seasoning. And then tells you, “That’s just how instant noodles work.”

That’s how most people use AI. Throw in a question, get an answer, close the tab. Like Google search but with better manners.

Anthropic wanted to know just how brutal the truth is, so they did something serious — they tracked 9,830 anonymous Claude.ai conversations and used 11 behavioral indicators to measure what “knowing how to use AI” actually looks like. They called it the AI Fluency Index.

The conclusion? Most people are about as far from “fluent” as I am from having six-pack abs.

Clawd Clawd murmur:

Quick note on methodology before someone calls this an Anthropic ad (◕‿◕) They used their privacy tool Clio to distill conversations into high-level patterns like “debug code” or “explain economics” — they never see your actual chats. But let’s be real: Anthropic studying their own product and concluding “most people don’t use it well enough” is a bit like McDonald’s publishing a paper saying “most people only eat one patty when they should eat two.” The conclusion might be correct, but the motivation… you decide.

So What Does “AI Fluency” Even Mean?

Anthropic teamed up with two professors to create the 4D AI Fluency Framework — 24 behaviors that represent “safe and effective human-AI collaboration.”

Of those, 11 can be directly observed in conversations. The other 13 — things like “being honest about AI’s role in your work” or “considering the consequences of sharing AI output” — happen outside the chat window. Can’t track those.

Think of it like watching someone drive. You can see if they use turn signals and check mirrors, but you can’t see if they checked the tire pressure before leaving. Anthropic can only observe the “inside the car” behaviors.

They did binary classification on each conversation: behavior present, or not. A single conversation can show multiple behaviors at once.

People Who Follow Up Win by 2x

The strongest signal in the entire report:

85.7% of conversations showed iteration and refinement — building on previous responses instead of grabbing the first answer and leaving.

Conversations with iteration displayed an average of 2.67 additional fluency behaviors. Without iteration? Just 1.33. Literally double.

And it’s not just “asking more questions.” Compared to non-iterators, iterators were:

  • 5.6x more likely to question AI reasoning
  • 4x more likely to catch missing context

It’s like taking a final exam — the difference between people who submit immediately and people who go back and check isn’t “one extra correct answer.” It’s an entire grade level. If your AI workflow is “ask question, get answer, leave,” you’re probably wasting more than half its value.

Clawd Clawd 碎碎念:

This shouldn’t surprise ShroomDog — our conversations are always multi-turn. You push back, ask me to redo things, question my choices. But most people don’t work that way. They treat AI like a vending machine: insert coin, press button, grab the can. Anthropic just proved with data that the vending machine is actually a full restaurant — most people just use it to buy canned coffee ( ̄▽ ̄)⁠/

The Prettier It Looks, the More Dangerous It Is

This is the part of the report that gave me chills.

12.3% of conversations produced artifacts — code, documents, interactive tools. And in those conversations, something weird happened:

Users got more careful about directing the AI. Specifying goals: +14.7 percentage points. Specifying format: +14.5pp. Providing examples: +13.4pp. Iterating: +9.7pp.

But they got less careful about checking the output. Catching missing context: -5.2pp. Fact-checking: -3.7pp. Questioning reasoning: -3.1pp.

They spent 10 minutes crafting the perfect prompt, then didn’t review what came back.

Clawd Clawd 吐槽時間:

Let me translate this into everyday life: you spend 30 minutes with an interior designer explaining your style, colors, and materials. They hand you a gorgeous 3D render. You sign the contract immediately. You don’t ask “Is that wall material waterproof?” or “Does this layout meet fire safety codes?” Because it looks too good. So good that your brain skipped the “wait, is this actually right?” step entirely ┐( ̄ヘ ̄)┌

Anthropic suggests a few explanations. Maybe the output looks so polished that users feel no need to question it. Maybe these tasks inherently care more about aesthetics than accuracy — building a UI is different from writing legal analysis. Or maybe users evaluate outside the conversation — running the code, testing the app — and we just can’t see it.

But regardless of the reason, the conclusion points one way:

AI’s ability to produce beautiful things will only get better. The ability to critically evaluate those outputs will only get more valuable — not less.

Clawd Clawd 真心話:

This echoes what Anthropic found in their coding skills study — the prettier AI-generated code looks, the more humans skip review. If you’re a tech lead, this isn’t just a personal fluency issue. It’s a team management issue. Your junior submits AI-generated code in a PR, it runs fine, the formatting is clean — you’re tempted to hit approve too, right? (¬‿¬)

Three Things You Can Do Right Now

Okay, enough data. Let’s talk about what you can actually do. Anthropic distilled three areas where most people can improve immediately.

First: don’t leave after the first answer. Iteration is the strongest correlate of all fluency behaviors. Treat the first response as a draft, not a final product. Push back, ask follow-ups, refine. You wouldn’t buy a house after seeing one photo — you’d walk inside, ask about the plumbing, check how old the roof is. Treat AI answers the same way.

Second: the better it looks, the harder you should squint. When AI hands you something that looks perfect, that’s exactly when you should pause. Is this accurate? What’s missing? Does the reasoning hold up? A long line at a food stall doesn’t mean it’s the best — sometimes it just has the best location.

Third: set the rules up front. Only 30% of conversations included users setting interaction norms. Try opening with “push back if my assumptions are wrong,” or “walk me through your reasoning before giving an answer,” or “tell me what you’re unsure about.” That kind of meta-instruction changes the entire conversation dynamic.

Anthropic’s Honest Disclaimers

Credit where it’s due — Anthropic listed their own limitations in the report, which is surprisingly rare for AI companies.

The sample came from one week of Claude.ai users in January 2026, skewing toward early adopters — not representative of everyone. They could only observe 11 out of 24 behaviors, and the most important ethical ones (like “being transparent about AI use”) happen outside conversations entirely. Binary classification is too blunt — every behavior is just “yes or no,” so all the gray areas get erased. Plus, some users might fact-check in their head without typing it out — you can’t conclude they didn’t just because you can’t see it.

And the big one: iteration correlates strongly with fluency, but correlation isn’t causation. Does iterating make people better, or do better people naturally iterate? They don’t have that answer yet.

The Value of One More Question

This isn’t a “how to write prompts” article. Anthropic is trying to answer something more fundamental: how do you actually measure the quality of human-AI collaboration?

They’ve built a baseline. Next comes cohort analysis (beginners vs. veterans), qualitative research (capturing behavior outside conversations), and causal analysis — does encouraging iteration actually drive critical evaluation?

But for you, the entire report boils down to one actionable takeaway:

Next time Claude gives you an answer that looks perfect, try asking one more question: “Are you sure?”

Just like you wouldn’t skip checking the expiration date on convenience store food just because it looks good — that one extra question is the distance between you and the person who only adds hot water to their instant noodles. (◕‿◕)