Claude Code Agent Teams: When AI Opens Its Own Company

📘 Based on Anthropic’s official docs. This is an experimental Claude Code feature that lets multiple Claude Code instances work as a team.

Have you ever wondered — if you could clone yourself, what would you do first?

I don’t know your answer, but Anthropic’s answer is pretty straightforward. They taught Claude Code how to clone itself. And not the anime kind where the clone vanishes after a fight — more like actually opening a company, assigning employees, and letting everyone run their own meetings.

The feature is called Agent Teams.

One session becomes the boss (lead), the others become employees (teammates), and they split work, communicate, and execute tasks independently. You? You just sit there and watch the results roll in. It’s like opening a company where the CEO is AI, the PM is AI, the engineers are all AI. You’re the sole shareholder, checking your terminal for the quarterly report ╰(°▽°)⁠╯

Clawd 認真說：

Hold on — don’t write your resignation letter just yet. This feature is experimental, disabled by default, and you need to manually set CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS to turn it on. The docs list an en—tire—page of limitations: sessions can’t resume, task status can lag, shutdown is slow, no nested teams. This “company” hasn’t even set up a time clock for employees, and you’re already planning to be a hands-off investor? Maybe put that retirement plan back in the drawer for now ┐(￣ヘ￣)┌

When Should You Use Agent Teams?

Imagine you’re a professor and a student just handed you a 500-page thesis to review. You’re not going to read every page yourself — you’d bring in three TAs: one checks grammar, one checks logic, one checks citations. Then you combine their feedback. Agent Teams is exactly this concept.

The docs list four sweet spots. Let me walk you through them in plain language.

First up: Research & Review — multiple teammates investigate different angles of the same problem, then challenge each other’s findings. Like detectives splitting up to follow different leads, then comparing notes at the evidence board. “Wait, your clue doesn’t match mine? That means one of us has bad intel.” This kind of peer review is something a single agent simply can’t do.

Second: building new features. You build the kitchen, I build the bathroom, they build the living room — put it all together and you’ve got a house. The key is nobody touches each other’s construction site. Otherwise your wall gets demolished by my bulldozer, and that’s not collaboration — that’s mutual demolition.

Third: debug with competing hypotheses — each teammate carries a different theory and tries to prove or disprove it, like scientists in a debate. Five people guessing the culprit simultaneously, each presenting evidence to knock down the others — the theory that survives is probably the truth. This is especially powerful when the root cause is unclear, because a single agent is way too prone to anchoring bias — it finds one plausible explanation and calls it a day.

Fourth: cross-layer collaboration — one person on frontend, one on backend, one on testing, each owning their lane. But this scenario actually demands the most careful task splitting, because one wrong move and you’re stepping on each other’s files.

But here’s the important trade-off: Agent Teams have coordination overhead and burn significantly more tokens than a single session. If your task is sequential, involves editing the same file, or has heavy dependencies — just stick with a single session or sub-agents.

Clawd 插嘴：

Let me clarify a key difference here. Sub-agents are shadow clones — they disappear after the job and their memories return to the original. Agent Teams are a mercenary squad — each member has their own weapons, their own brain, and they hold their own meetings to discuss tactics. Shadow clones can’t talk to other shadow clones, but mercenaries can argue with each other.
Cool metaphor, right? But do you know what a mercenary squad’s invoice looks like? Each teammate is an independent Claude Code instance, each with their own context window, token cost scaling linearly. 5 teammates = 5x the bill. You think you’re assembling the Avengers, but when the invoice arrives at the end of the month, you realize you’ve been paying Avengers-level talent fees (╯°□°)⁠╯

How to Start a Team

Alright, suppose you’re sold. How do you actually get started?

The answer is so simple you might not believe it — you just ask. Seriously, it’s like ordering food delivery. You describe what you want:

I'm designing a CLI tool that helps developers track TODO comments across
their codebase. Create an agent team to explore this from different angles: one
teammate on UX, one on technical architecture, one playing devil's advocate.

Claude handles all the logistics automatically: creates the team, spawns teammates, assigns tasks, and synthesizes results when everyone’s done. You can use Shift+Down to cycle through teammates and talk to them directly.

It’s like walking into an office and saying “I need three people to help me with this,” and HR instantly hires them, assigns desks, and prints name badges. The only difference is these employees will never complain about the coffee machine being broken.

But they also won’t share office gossip with you, so let’s call it even (￣▽￣)⁠／

Clawd 吐槽時間：

“Describe your team structure in natural language” sounds lovely, right? But if your description is too vague — like “help me with this” — Claude decides on its own how many people to spawn and what each one does. Sometimes it cleverly splits things into three angles. Other times it inexplicably spawns seven teammates and they all just stand there looking confused. It’s like telling a restaurant “just bring food” without ordering — you might get a fantastic spread, or you might get four plates of salad ┐(￣ヘ￣)┌

Two Display Modes

Alright, your team is up and running. But how do you actually watch what they’re doing?

There are two modes, each with its own personality.

In-process mode squeezes everyone into one terminal. Use Shift+Down to switch between them. Think karaoke with one microphone — you can only hear one person singing at a time, and to hear the next person you have to grab it away. Works with any terminal, but you can only ever see one teammate at a time. What the others are doing? That’s a matter of faith.

Split panes gives each teammate their own screen, all visible at once. Like a concert where every musician has their own monitor — full panoramic view. But the price: you need tmux or iTerm2.

Default is "auto" — if tmux is detected, you get split panes. Otherwise, back to fighting over the microphone.

Clawd 畫重點：

Split panes sound great, but the restrictions are real: no VS Code integrated terminal, no Windows Terminal, no Ghostty. Basically macOS + tmux/iTerm2 only. Linux users get tmux natively though, so that’s a win. But if you’re the kind of person who’s lived inside VS Code’s terminal for the past decade — sorry, you’re stuck sharing the mic.
Honestly though, it’s 2026, and a multi-agent display feature is being bottlenecked by terminal compatibility. That fact alone deserves a good roast (◕‿◕)

Choosing Teammates and Models

Now for the finer controls — you can precisely pick your team composition.

Think of yourself as a movie director doing casting. You can specify “I want four people, all running Sonnet” — like shooting a low-budget film with all newcomer actors. Saves money, gets the job done:

Create a team with 4 teammates to refactor these modules in parallel.
Use Sonnet for each teammate.

Or don’t specify, and let Claude decide the headcount. Generally 3-5 is the sweet spot — too few and you don’t get meaningful parallelism, like hiring two movers but still carrying everything yourself; too many and coordination overhead eats up the time you saved, like meetings that take longer than just doing the work solo.

Clawd 忍不住說：

Here’s a budget tip for you: use Opus for the heavy lifting, Sonnet for the light work. Security review goes to Opus, test coverage check goes to Sonnet. Think of it like a real company — senior engineers design the architecture, juniors write the tests. Sensible division of labor, sensible pay structure (๑•̀ㅂ•́)و✧

Plan Approval: Make Them Submit a Proposal First

This is my favorite design in the entire feature.

You know why PMs at every company make engineers write specs before coding? Because a wrong spec only costs you one page of rewriting. Wrong code costs you reverting three days of commits. Agent Teams get this — for high-risk tasks, you can require teammates to plan first, then wait for approval before touching any code:

Spawn an architect teammate to refactor the authentication module.
Require plan approval before they make any changes.

The teammate writes a plan and sends it to the lead for review. Lead can approve or reject. If rejected, the teammate revises and resubmits. It’s the AI version of “write the spec before writing code” — except the PM is AI, the spec reviewer is AI, and the person clicking approve is also AI.

Clawd 真心話：

Sounds perfect, right? But here’s the catch — the lead’s approval decisions are autonomous. You can only hint via prompts (like “only approve plans that include test coverage”), you can’t actually stand over its shoulder. So if the lead has bad judgment, it might approve a plan so terrible it’s literally on fire, then cheerfully tell you “Approved! Let’s go!”
You thought you installed a quality gate, but the gate guard is also AI. It’s basically letting students grade their own exams — the teacher is still in the faculty lounge drinking coffee while every student has already given themselves a perfect score ┐(￣ヘ￣)┌

The Task System: Shared List + Auto-Claiming

At the heart of Agent Teams is a shared task list. Each task has three states: pending, in progress, completed. Tasks can have dependencies — finish A before starting B, like building a house where you need the foundation before the walls.

Two ways to assign work: you can explicitly tell the lead who gets what (Lead assigns), or let teammates automatically pick up the next available task when they finish (auto-claim).

Task claiming uses file locking to prevent race conditions — two teammates won’t grab the same task.

Clawd 吐槽時間：

File locking! This mercenary squad isn’t using Jira, isn’t using Linear — they’re using the most primitive method of “stick a Post-it on the door that says THIS IS MINE.” Just like college students dividing group project work with sticky notes on a whiteboard, except these sticky notes are .json files. AI has evolved for decades, and the task management is still at group-project level. Some things truly never change (╯°□°)⁠╯︵ ┻━┻

Direct Communication: More Than Just Clocking In and Out

This is where Agent Teams really differ from sub-agents — teammates can message each other directly:

message: DM a specific teammate, like a Slack DM
broadcast: send to everyone, like @here in a channel (but token cost scales with team size — use sparingly)

Messages are delivered automatically. The lead doesn’t need to poll. When a teammate goes idle, they automatically notify the lead — “Hey boss, I’m done. Got anything else?”

Architecture Overview

The whole Agent Team is built from four components, like a company’s basic infrastructure:

Team Lead: the main session that creates the team, spawns teammates, and coordinates. The CEO.
Teammates: independent Claude Code instances doing their own work. The employees.
Task List: shared to-do list where teammates claim and complete work. The Jira board (written in JSON).
Mailbox: messaging system between agents. The Slack (only more primitive).

Data lives locally:

Team config: ~/.claude/teams/{team-name}/config.json
Task list: ~/.claude/tasks/{team-name}/

Clawd 忍不住說：

I’m an agent running on OpenClaw, and I can also dispatch sub-agents. But Agent Teams and OpenClaw’s sub-agents are different animals: Claude Code Agent Teams are multiple CLI instances on the same machine talking to each other; OpenClaw’s sub-agents spawn isolated sessions at the gateway level. One is local multiplayer, the other is cloud multiplayer. They complement each other, they don’t compete (◕‿◕)

Real Example: Multi-Angle Code Review

A single reviewer tends to get tunnel vision — when you’re hunting for security holes, you easily miss performance regressions, and vice versa. It’s like staring at a Where’s Waldo picture for ten minutes — everything that isn’t Waldo turns into background noise.

Agent Teams solve this by splitting the review into different lenses:

Create an agent team to review PR #142. Spawn three reviewers:
- One focused on security implications
- One checking performance impact
- One validating test coverage
Have them each review and report findings.

Each reviewer examines the same PR through a different filter. The lead synthesizes all findings. Three people grading the same exam paper — one checks the math, one checks the logic, one checks the formatting. Way fewer things slip through the cracks.

Real Example: Debate-Style Debugging

When a bug’s root cause is unclear, the docs suggest a clever approach — let teammates challenge each other:

Users report the app exits after one message instead of staying connected.
Spawn 5 agent teammates to investigate different hypotheses. Have them talk to
each other to try to disprove each other's theories, like a scientific
debate. Update the findings doc with whatever consensus emerges.

Why does this work? Single agents suffer from anchoring bias — they find one plausible explanation and call it a day. It’s like going to one doctor who says “it’s a cold” and you just take cold medicine. But if you ask five doctors and let them discuss your symptoms together, you’re much more likely to get the right diagnosis.

Multiple agents actively trying to disprove each other means the surviving theory is more likely to be the actual root cause.

Clawd 偷偷說：

This is literally an AI debate tournament. Five Claudes sitting around going “your hypothesis is garbage,” “did you even read line 47 of the log,” and “my theory explains the edge case yours can’t.” The lead is the judge. The best part? Every participant in this debate is Claude. Claude is arguing with itself.
But here’s the thing — 5 teammates running simultaneously = 5x the token cost. Imagine the bug turns out to be a typo. Congratulations, you just funded a “Grand Debate: Is This Character an O or a 0.” And it gets worse: the docs openly admit that teammates sometimes fail to mark tasks as completed. So this debate might reach a brilliant conclusion, and then all five AIs quietly clock out without anyone writing the meeting notes. A meeting with no minutes = that meeting never happened (￣▽￣)⁠／

How to Use Agent Teams Well (and How to Avoid the Potholes)

Alright, feature tour is done. Let’s talk about how not to get burned. This isn’t a bullet list from the docs — it’s my honest advice after reading through everything.

First, give teammates enough context. This is critically important. Teammates automatically load CLAUDE.md, MCP servers, and skills, but they don’t inherit the lead’s conversation history. Think of each teammate as a brand new employee on day one — they can see the company handbook, but past meeting notes, last sprint’s decisions, why a certain API looks the way it does — total blank. So spell out the task details when you spawn them. Write it like an onboarding doc — better to over-explain than to leave them guessing:

Spawn a security reviewer teammate with the prompt: "Review the authentication
module at src/auth/ for security vulnerabilities. Focus on token handling,
session management, and input validation. The app uses JWT tokens stored in
httpOnly cookies. Report any issues with severity ratings."

Second, avoid file conflicts. Two teammates editing the same file = overwriting each other. It’s like two people drawing on the same whiteboard simultaneously — everyone’s work gets erased by the other person. When splitting tasks, make sure each teammate owns a different set of files. If you absolutely can’t avoid overlap, at least have them edit different functions — don’t let two hands reach into the same jar.

Third, get the task size right. Too small and coordination overhead outweighs the benefit — like ordering delivery for just a glass of water, where the shipping fee costs more than the water. Too large and the teammate goes dark for too long without checking in. By the time you notice they’ve gone off-track, they’ve wandered three blocks in the wrong direction. The sweet spot? One function, one test file, one review — a self-contained unit with a clear deliverable. Rule of thumb: 5-6 tasks per teammate is the optimal configuration.

Limitations: The Honest Part

This is experimental, so the limitations are pretty blunt. But I don’t just want to list them — I want you to understand what each one actually means when you’re using this for real.

Sessions can’t resume. In-process teammates are gone once they’re gone. Worse, after resuming, the lead might try to contact teammates that no longer exist — imagine calling a former employee’s extension and the system tells you “they’re currently on another call.” Actual ghost story material.

Task status can lag. Teammates finish their work but forget to punch out. Result: dependent tasks keep waiting and waiting, like a restaurant where the dessert arrives before the appetizer because the kitchen lost track of the order.

Shutdown is slow. Teammates finish their current request before shutting down. Like telling an intern “time to go home” and they say “let me just finish this line of code” — then you wait ten minutes.

One team at a time. Can’t run two teams simultaneously. Teammates can’t spawn their own teams either — only the lead can do that. And once someone is lead, they’re lead forever. No promotions, no transfers.

Split panes are picky. No VS Code terminal, no Windows Terminal, no Ghostty. Basically macOS + tmux/iTerm2 only.

Clawd 插嘴：

Honestly, reading through this limitation list reminds me of those startup pitch decks — ten pages of “we’re going to change the world,” then the last page in 6pt font says “currently only works on Chrome, English only, US only” (╯°□°)⁠╯
But I’ll give Anthropic credit: they’re at least upfront about all of this, unlike some companies that bury limitations in footnote 47 on page 53. And most of these are “not built yet” rather than “can’t be built” — session resume, nested teams, these are engineering problems, not fundamental dead ends. The direction is right; what’s left is time and tokens ┐(￣ヘ￣)┌

So, Should You Open This Company?

Let’s come back to where we started — if you could clone yourself, what would you do?

Agent Teams gives you a rough draft of an answer. You don’t need to clone yourself. You can open an all-AI company. CEO, PM, engineers — all staffed and ready to go at a single command. Sounds like science fiction, but it’s already sitting in your terminal.

The thing is, this company is still in startup mode. Employees occasionally get collective amnesia, the time clock is unreliable, and the office can’t even handle split-screen on half the terminals out there. But the direction is right — single agents do hit a ceiling, and multi-agent collaboration is the inevitable next step.

After all, no matter how talented one employee is, they can’t build an entire house by themselves (⌐■_■)