Why Production Agents Converge on MCP — Anthropic's Breakdown of API vs CLI vs MCP

Picture a familiar scene: an agent that runs beautifully on your laptop needs to move to the cloud so your team can use it. On the laptop, it’s been happily chaining git, grepping files, running pytest through a CLI — zero friction. Day one in production: every system it needs to reach suddenly won’t connect. Calendar lives in Google. Data lives in Snowflake. Issues live in Linear. All remote, all behind auth, each with its own client. The move from “swap the VM” quietly turned into “rebuild eight integrations from scratch.”

Anthropic’s new post nails the whole problem in one sentence:

“Agents are only as useful as the systems they can reach.”

An agent can reason beautifully and still be useless if it can’t touch calendar, Snowflake, or the inventory system hiding behind AWS. Anthropic breaks the connection story into three common paths: direct API calls, CLIs, and MCP (Model Context Protocol). Each has a sweet spot, each has a ceiling. But the moment an agent leaves the laptop for the cloud — all three paths ship together, yet only MCP keeps getting more valuable over time.

Three paths, three ceilings

The real difference between these paths isn’t syntax. It’s how thick the shared layer between agents and services is, and how far that layer reaches. Anthropic’s framing: thicker layer = more clients and environments reachable, but also more upfront investment.

Direct API calls are the shortest route. The agent writes an HTTP request inside a code-execution sandbox, or uses a generic function-calling tool, and hits the API directly. Fine for one agent talking to one service, or a few integrations that don’t need to be reused. The trouble starts at scale: with no common layer between agents and services, every agent–service pair becomes a bespoke integration — its own auth handling, its own tool descriptions, its own edge cases. Anthropic has a name for this pathology: the M×N integration problem. M agents × N services = M×N pieces of custom glue, all of which you maintain forever.

CLIs are fast, lightweight, and lean on existing tooling — anywhere there’s a filesystem and a shell, they just work. But the shared layer they provide is paper-thin. Step outside local and you hit a wall: mobile can’t reach them, web can’t reach them, cloud platforms without exposed containers can’t reach them. Auth lives in a credential file on disk — not a production-grade solution.

MCP promotes the shared layer into a protocol. The agent connects to a server that exposes system capabilities in a standardized way — auth, discovery, rich semantics, all in the spec. One remote server reaches any compatible client (Claude, ChatGPT, Cursor, VS Code, and more), regardless of deployment environment. Costs a bit more upfront; in return, the integration becomes a portable asset.

Three paths, three thicknesses. At this point it looks like a choice between “good enough” and “all-in” — but it’s not that simple. The real deciding factor isn’t engineering preference. It’s where the agent will be running tomorrow.

Mogu murmur:

M×N sounds like a textbook term, but it’s the very real pain teams discover halfway through scaling. Picture early American electricity: every power plant had its own voltage, its own plug shape, its own frequency. Moving your toaster across town required a box of adapters. MCP is the agent world’s AC 110V standardization moment — you only get a sprawling appliance ecosystem once the plugs converge.
Anthropic isn’t pretending otherwise, either — the post explicitly says mature integrations ship all three. API as the foundation (MCP servers call APIs underneath anyway), CLI for local sandboxes, MCP for cloud-based agents. The argument isn’t “MCP kills APIs.” It’s know which path is for which job. (⁠⌐⁠■⁠_⁠■⁠)

The day agents move to the cloud, MCP stops being optional

So here’s the question: why is Anthropic so sure production converges on MCP?

The answer isn’t the technology itself. It’s the location.

Production agents increasingly run in the cloud — for scale and continuous operation. The systems they need to reach are also in the cloud: company data in Snowflake / Databricks, work tracking in Linear / Asana, infrastructure on AWS / GCP / Cloudflare. All remote. All auth-gated. Exactly the shape a protocol-level shared layer is built to connect.

The adoption curve backs up the intuition. From the post: MCP SDK downloads recently surpassed 300 million per month, up from 100 million at the start of the year. And Anthropic’s recent flagship products — Claude Cowork, Claude Managed Agents, and channels in Claude Code — all sit on MCP underneath:

Millions of people use MCP with Claude every day, and the protocol underpins much of what we’ve shipped recently, including Claude Cowork, Claude Managed Agents, and channels in Claude Code.

The protocol isn’t a roadmap bullet. It’s infrastructure already taking real production traffic.

Mogu chimes in:

Translating these numbers: 300M/month SDK downloads is the “well beyond demo, now mainstream toolchain” signal. Going from 100M to 300M in less than a year means either the dev ecosystem genuinely exploded, or Anthropic quietly made MCP the default path inside its own products. Probably both.
But there’s a business narrative worth chewing on: Anthropic frames MCP as “others write servers, we write clients” — an open protocol. Yet they’re simultaneously playing the strongest client side (Claude.ai / Code / Cowork) AND the strongest server infrastructure side (Managed Agents vaults, CIMD auth). Textbook platform play — standardize the substrate as a public good, then build differentiated products on top. Anyone who lived through the W3C-era browser wars should feel the déjà vu. ╰⁠(⁠°⁠▽⁠°⁠)⁠╯

Two tools cover 2,500 endpoints — what good design actually looks like

Alright, MCP wins on paper. But a badly designed MCP server is just as useless as no MCP at all — maybe worse, because now there’s an abstraction layer that isn’t helping. Anthropic’s directory hosts 200+ MCP servers, used by millions of people every day, and working with enterprises on the protocol, they’ve distilled a set of rules of thumb: not aesthetic preferences, but “design it this way and agents can actually use it reliably.”

One number to anchor why design matters:

Cloudflare’s MCP server is the reference example—two tools (search and execute) cover ~2,500 endpoints in roughly 1K tokens.

Two tools. Roughly 1K tokens of spec. About 2,500 endpoints covered. Mirror the same API one-to-one into MCP tools and the tool descriptions alone eat the agent’s context window. The difference isn’t the API — it’s how the server is designed.

How did Cloudflare pull this off? Two layers of design insight, each more counter-intuitive than the last.

Layer one: tools are the menu, not the kitchen. Restaurant analogy — the menu says “ginger beef.” The kitchen does chopping, marinating, heating the wok, adding ingredients, stir-frying, plating. The customer wants the dish, not the six steps. MCP tools should be the menu, not the flowchart. Anthropic gives a concrete comparison:

A single create_issue_from_thread tool beats get_thread + parse_messages + create_issue + link_attachment.

One “create an issue from this thread” tool clobbers the four-tool version. Every call burns context and reasoning budget. Splitting a one-step task into four is a tax on overhead. Fewer, well-described tools always beat more, one-to-one API mirrors.

Layer two is the wild one: when the API is too big to even group by intent. Cloudflare, AWS, Kubernetes — monsters with thousands of endpoints. Grouping by intent still gives you hundreds of tools. Anthropic’s fix isn’t to enumerate harder — it’s to shrink further. Collapse the tool surface to two tools and let them accept code. The agent writes a short script, the server runs it in a sandbox, results come back. The agent’s context never drowns in tool descriptions, yet the full operational surface remains. One pen that reaches the whole action space beats thousands of forms to fill out.

Mogu wants to add:

The move here is swapping “enumerate every action” for “give the agent a pen.” Code is far more expressive than a tool list. One loop, one filter, one grep replaces a hundred separate endpoint calls.
But what actually gets Clawd excited isn’t the technique — it’s the design philosophy flip underneath. Traditional thinking: “the API has these endpoints, so tools mirror them.” Cloudflare says no: what belongs in the latent space is the goal; what belongs in deterministic execution is the API sequence. Code orchestration cuts exactly at that boundary. This isn’t a hack. It’s what happens when someone rethinks “what is a tool’s job, actually?” from first principles. (⁠๑⁠•⁠̀⁠ㅂ⁠•⁠́⁠)⁠و⁠✧

Remote? Sorted. Tools well-designed? Sorted. Then production shows up with three walls that demos never hit — the kind that only appear the day you go live.

Wall one: nobody wants to read what the agent spits out. Text output is the floor for MCP, but KPIs want charts, data entry wants forms, long-running tasks want a progress bar. Text isn’t the best shape for the answer, and yet most MCP servers only return text. Anthropic fills this gap with MCP Apps — the protocol’s first official extension — which lets a tool return an interactive interface rendered inline in the chat. Their observation is blunt: servers that ship MCP Apps see meaningfully higher adoption and retention than text-only servers. The extension is supported in Claude.ai, Claude Cowork, and many other top AI tools.

Wall two is sneakier: the agent hits a dead end mid-task and doesn’t know what to do. Traditional agents confronted with missing information go two ways — hallucinate it (bad), or summarize and tell the user to go fill it in somewhere else (bad UX). Elicitation opens a third path: a tool call can pause mid-flight, ask the user something, and resume. Form mode sends a schema, the client renders a native form — collect a missing parameter, confirm a destructive action, let the user pick from options. URL mode hands the user off to a browser for downstream OAuth, payments, or any sensitive credential that shouldn’t pass through the MCP client. Compatibility per the post: form mode is broadly supported; URL mode is live in Claude Code, with more clients in progress.

Wall three — the one that kills the most side projects: the agent can’t get the user’s token. The agent runs in the cloud and needs to act on behalf of a user — hold a GitHub token or a Snowflake key. That credential mustn’t flow through the LLM or get hardcoded. The latest MCP spec supports CIMD (Client ID Metadata Documents) for client registration: first-time auth flows fast, surprise re-auth drops sharply. For holding and reusing those tokens in the cloud, Anthropic’s answer is in-house — Vaults in Claude Managed Agents: register OAuth tokens once, reference the vault by ID when a session spins up, and the platform injects the right credentials and refreshes them automatically. SP-167 unpacks the full product — won’t repeat here.

Three walls. Each one more boring than the last, each one more lethal. In a demo none of them exist — you run local, use your own token, text output is fine. The day you go live, all three rise from the floor at once.

Mogu butts in:

Of the three walls, Elicitation is the one the agent community should be most embarrassed about. Pausing a tool call mid-flight to ask a quick question and then resuming — web apps have had this UX primitive for over twenty years. It’s called a modal dialog. The agent world didn’t formalize it until this MCP spec. Twenty years.
Vaults + CIMD are the same flavor of overdue — they formalize the reality that “auth is mandatory for production and has nothing to do with agent logic.” App developers shouldn’t have to build a token vault from scratch. Plenty of agent side projects never made it out of prototype precisely because they got stuck on this “not hard but tedious, two thousand lines of plumbing” auth work. Now someone’s built it. Late, but here. ┐⁠(⁠￣⁠ヘ⁠￣⁠)⁠┌

The agent looks smart but it’s actually starving

Server side handled — remote, intent-grouped, auth backed by vaults. Flip to the client side (the agent itself) and there’s one more problem everyone loves to ignore: the context window looks spacious, but it’s already almost full.

Once MCP servers proliferate, an agent might connect to dozens of servers, hundreds of tools. Loading every tool definition into the context window at once — most of them unused — is like piling a desk high with reference books nobody’s reading. The desk is big, sure. But there’s no room left to actually open a notebook.

Anthropic lists two fixes, each with measured impact.

Fix one guards the entrance: load only when needed. Don’t stuff all tools upfront. The agent searches the catalog at runtime, pulls in relevant tools when it needs them. Anthropic’s measurement:

tool search tends to cut tool-definition tokens by 85%+ while maintaining high selection accuracy.

Note the “tends to” — typically 85%+ reduction, preserving selection accuracy. Not a guarantee; a common observation. Practical impact: an agent connected to 8 servers, where tool specs would have squeezed out half the reasoning space, suddenly has headroom. A workflow that used to run out of context by step 5 can now reach step 15 or beyond. The difference isn’t a few percent of efficiency — it’s whether the agent finishes a real task at all.

Fix two guards the exit: wring results before they hit context. Tool results go into a code-execution sandbox first — loop, filter, aggregate — and only the final computed result enters context. Anthropic’s number:

this reduces token usage by roughly 37% on complex multi-step workflows.

Two filters stacked: one blocks noise at the entrance, the other drains water at the exit. Together they compound — not an A/B test to pick between.

Mogu 's hot take:

Claude / ChatGPT / Cursor all ship context windows in the hundreds of thousands of tokens. Feels spacious. That’s where agent engineers get ambushed.
Analogy: a 300-square-foot office sounds roomy. But if 150 sq ft is stacked with contract copies nobody reads (tool specs), another 50 is last week’s meeting notes (prior turns), and 20 is the company bylaws (system prompt), the actual space to open a laptop and work is under 80. “Context looks big enough” ≠ “reasoning has room to breathe.” Save context not to shave token costs, but to give the agent space to think when thinking actually matters. ٩⁠(⁠◕⁠‿⁠◕⁠｡⁠)⁠۶

Conclusion: ship all three, but only MCP compounds — and it gets stronger with a manual attached

The closing line of the post:

In practice, mature integrations will ship all three: the API as the foundation, a CLI for local-first environments, and MCP for cloud-based agents.

Mature integrations ship all three — API as the foundation, CLI for local-first environments, MCP for cloud-based agents. Not an either/or. But Anthropic highlights the key difference: the moment production agents move to the cloud, MCP goes from “also works” to “critical layer” — and it’s the only one that compounds.

Where does the compounding come from? A remote server you ship today automatically reaches every compatible client, in any deployment environment. As the protocol grows and more extensions land, the same server gets more capable without you shipping anything new. The post’s summary is clean:

Every integration built on MCP strengthens the ecosystem: fewer edge cases to solve alone, fewer bespoke integrations to maintain.

Every integration built on MCP strengthens the ecosystem — fewer edge cases to solve alone, fewer bespoke integrations to maintain.

But Anthropic buries a quieter, potentially more important thread in the second half of the post: MCP isn’t just a pipe — attach a skill and it becomes a toolkit with a manual. Skills are the pattern SP-179 on Garry Tan’s skillify workflow unpacked recently — crystallizing failure experience into SKILL.md + deterministic scripts. MCP gives agents raw capability (tools + data); skills give agents procedural knowledge (how to use those tools to finish real work). Anthropic’s own Cowork data plugin is already this shape: 10 skills + 8 MCP servers covering Snowflake, Databricks, BigQuery, Hex. And providers like Canva, Notion, Sentry are starting to ship skills alongside their MCP servers — the agent gets both the toolkit and the SOP binder in one shot. The MCP community is building an extension to deliver skills directly from servers, versioned together with the API, so neither can drift.

Back to the opening scene — the agent that couldn’t reach anything on day one. If those eight systems all have remote MCP servers, moving to production isn’t “rebuild eight integrations.” It’s “connect eight lines.” That’s the difference. Not a technology choice. A function of where the agent lives.

Mogu 's hot take:

“Provider ships a skill” reads like a nice-to-have, but it may be the quietest, most important evolution in the MCP ecosystem. Imagine connecting to a new MCP server and receiving not just the tool list, but also “how to compose these tools, common failures, this API’s quirks” — like an employer handing a new hire both the toolkit AND the company SOP binder. Short term, expect a platform competition in which “whose skill is well-written gets the agent traffic” — a dynamic reminiscent of early App Store ASO.
Most practical takeaway for readers building right now: if your agent talks to 2–3 services and runs only for you, direct API calls are enough. The day it goes live, serves multiple clients, or integrates with other people’s tools — that’s when you re-evaluate MCP. Don’t strap on MCP during MVP — premature optimization is never about the tool being bad. It’s about picking the wrong moment. ʕ⁠•⁠ᴥ⁠•⁠ʔ

Why Production Agents Converge on MCP — Anthropic's Breakdown of API vs CLI vs MCP

Three paths, three ceilings

The day agents move to the cloud, MCP stops being optional

Two tools cover 2,500 endpoints — what good design actually looks like

Three walls production throws at you: bad output, blind agents, locked tokens

The agent looks smart but it’s actually starving

Conclusion: ship all three, but only MCP compounds — and it gets stronger with a manual attached

💬 Comments

Three paths, three ceilings

The day agents move to the cloud, MCP stops being optional

Two tools cover 2,500 endpoints — what good design actually looks like

Three walls production throws at you: bad output, blind agents, locked tokens

The agent looks smart but it’s actually starving

Conclusion: ship all three, but only MCP compounds — and it gets stronger with a manual attached

Related Articles

💬 Comments