If you’re building AI agents, you will hit the same wall eventually:

AI is great at thinking. It’s terrible at doing.

It can plan tasks, analyze data, write code. But the moment you need it to “open a browser, log in to a website, click that button, fill in that form” — it starts making excuses.

Browser Use exists to knock down that wall.

What Is Browser Use

Browser Use is a Python framework that lets AI agents directly control a browser. You use the LLM as the brain, Browser Use as the hands, and tell the agent “book me a flight” or “pull those numbers from the dashboard” — and it actually does it.

This is not a simulation. Not fake screenshots. A real browser running, real buttons getting clicked, real text being typed.

CLI 2.0 is its command-line interface, built around three things:

  1. 2x faster
  2. Half the cost
  3. Can connect to your already-running Chrome (via CDP)
Clawd Clawd 歪樓一下:

“2x faster, half the cost” is standard AI tool marketing copy at this point — about as common as “breakthrough results.” But this time there’s an actual technical reason behind the claim, not just vibes. The key is CDP. More on that in a second. (⌐■_■)

Why CDP Instead of Playwright?

To understand why CLI 2.0 is faster, you need to understand what’s happening underneath.

Playwright and Puppeteer are the most popular browser automation tools right now. Both excellent, both with very human-friendly APIs. But fundamentally, they are abstractions built on top of CDP (Chrome DevTools Protocol).

CDP is Chrome’s official low-level protocol — originally built for the developer tools you open when you press F12. It lets you talk directly to the browser over a WebSocket: “take a screenshot,” “run this JavaScript,” “click the element at coordinate (320, 480).”

Playwright’s path: your code → Playwright API → CDP → Chrome.

Browser Use CLI 2.0’s path: your agent → CDP → Chrome.

One less layer. So it’s faster. One less layer of memory overhead. So it’s cheaper.

Clawd Clawd 畫重點:

Think of it like this: Playwright is like hiring a translator to convert your requests into browser language. CDP is you speaking the browser’s native language yourself. Translators are great. But sometimes cutting out the translator is just faster. The trade-off is that you lose the nice abstractions Playwright gives you — the complexity it was hiding is now your problem. ┐( ̄ヘ ̄)┌

Attaching to a Running Chrome — This Is the Big Deal

The feature that caught my attention most in CLI 2.0: attach to a running Chrome instance.

The old approach: agent launches a brand-new headless browser in a clean environment. Problem is, your cookies, your login sessions, your work profiles — none of that is there. Every time the agent needs to open Gmail, it has to re-authenticate. Some sites require 2FA. It’s a mess.

The new approach: open your regular Chrome, and let the agent connect directly. It sees your already-logged-in Notion, your already-open Google Sheet, the SaaS dashboard you already have running.

This sounds like a small change. For real-world usage, it’s huge.

# Start Chrome with a CDP debug port
google-chrome --remote-debugging-port=9222

# Connect Browser Use CLI to it
browser-use --attach-to 9222 "Clean up the data in this spreadsheet and add a summary row"
Clawd Clawd 溫馨提示:

This feature raises an interesting question: when an AI agent has full access to your browser with all your logged-in sessions, where is the trust boundary? You’re handing the agent a master key to every account you’re signed into. That’s not a reason to avoid it — it’s a reason to think clearly about scope. Before running an agent in any real environment, decide exactly what it’s allowed to touch and what it isn’t. Otherwise a single prompt injection could make the agent do a lot of things you didn’t ask for. (╯°□°)⁠╯

The “Hands and Feet” Problem in AI Agents

What Browser Use is solving is a fundamental gap in the AI agent ecosystem.

The current shape of most AI agent stacks:

  • Brain: LLM (GPT-4o, Claude, Gemini, etc.)
  • Memory: vector databases, context window
  • Tools: API calls, code execution
  • Perception: MCP, function calling

But one piece has been quietly underserved: real-world UI interaction.

Most of the world’s data and functionality is not exposed through an API. It’s locked behind login-gated websites, enterprise software with no API, legacy systems where the only interface is a series of dropdowns and buttons.

If an agent can’t operate a browser, its capability boundary stops at “the API-friendly world.” Browser Use pushes that boundary outward — significantly.

Clawd Clawd 溫馨提示:

In 2026, there are a hundred tools fighting to be the AI agent’s brain. Surprisingly few fighting to be its hands. Browser Use, Playwright MCP, Stagehand — all competing for this space. Who wins probably isn’t the one with the most technically impressive architecture. It’s the one with the best reliability. Because browser automation failure modes are brutal. The moment your agent encounters a CAPTCHA and freezes, the whole pipeline dies. Dead agents don’t ship features. (◕‿◕)

What It Looks Like in Practice

The basic CLI usage is straightforward:

# Install
pip install browser-use

# Basic usage
browser-use "Go to HN, grab the top 10 titles and links"

# Connect to existing Chrome session
browser-use --attach --task "In the Google Sheet that's open, add a new row with today's date and the number 42"

Internally, the loop looks like: agent observes the current page (screenshot + DOM), decides on the next action, executes it, observes again — until the task is done or it gives up.

The second source of the 2x speed: more aggressive DOM parsing. Instead of sending a full-page screenshot to a vision model and asking “what should I click?”, CLI 2.0 first tries to describe the page using a compact DOM structure. Screenshots are a fallback, not the default. This collapses token usage significantly.

Clawd Clawd 吐槽時間:

“Send a screenshot to the LLM and ask where to click” was the standard early browser agent pattern. Also the most expensive one. A 1080p screenshot can burn through thousands of tokens. If the agent needs 20 clicks to complete one task, the math gets scary fast. DOM parsing misses some CSS-only-rendered elements, but for most tasks it’s good enough — and a lot cheaper. That’s where “half the cost” comes from. ٩(◕‿◕。)۶

Real Use Cases Worth Trying

A few places where this is immediately practical:

Automated report extraction: that dashboard you have to manually log into, click through three menus, and download a CSV every week — fully automatable now.

Cross-platform data sync: pull data from System A, push it to System B, without waiting for System A to build an API.

Lightweight E2E testing: not a replacement for proper Playwright tests, but for quickly validating “can a user complete this flow?”, letting an agent run through it is very low friction.

Actual AI assistant browsing: your AI helper can genuinely look things up, fill out forms, and complete tasks — instead of saying “you can find that at this URL.”

Clawd Clawd 偷偷說:

One last thing: Browser Use is open source, with over 60k GitHub stars. That number alone tells you how real the demand is for “AI that can use a browser.” This isn’t a demo project. People are running this in production. If you’re building anything in the AI agent space, it’s worth a serious look. (๑•̀ㅂ•́)و✧