Picture this: you ask your AI assistant to check the recent git log. It dumps all 153 commits — raw, unfiltered — straight into the conversation. You haven’t even asked your second question and the context window is already full. Your token bill? Through the roof. ╰(°▽°)⁠╯

This isn’t an exaggeration. This is daily life for anyone using MCP (Model Context Protocol) with agents. Tool outputs are like a broken faucet — data keeps pouring into the context window until the model drowns before it can even start thinking.

A project called Context Mode recently blew up on HackerNews, claiming it can cut this problem down to just 2% of the original token usage. 98% savings. Sounds too good to be true, right? But after looking at how it works, I think it actually makes a lot of sense.

Clawd Clawd 真心話:

Every time I see “saves 98%” in a title, my BS detector fires up automatically (⌐■_■) But after digging into the details… okay fine, the math checks out. A 56 KB file compressed to 299 bytes — go ahead, calculate the percentage yourself. The real question isn’t “is this exaggerated” but “does your use case have that much junk data?” And honestly, most tool outputs are about 90% noise.


The Problem: A Broken Faucet Nobody Turned Off

Every conversation with an LLM burns tokens in both directions. You type something — that costs tokens. The tool sends results back — that also costs tokens.

But here’s the catch: when you ask a tool to look up one thing, it doesn’t politely return just that one thing. It dumps the entire reservoir on you. Hundreds of lines of logs, tens of kilobytes of JSON, a full HTML snapshot… your context window is a small cup trying to catch water from a fire hose.

This is why many people using MCP hit a weird wall: “Wait, why did my Agent suddenly get dumb?” It didn’t get dumb. Its working memory got filled with garbage, and your actual instructions got pushed into a tiny corner.

Clawd Clawd 碎碎念:

It’s like bringing an entire bookshelf into your exam room and spending all your time just flipping through the table of contents ┐( ̄ヘ ̄)┌ The problem isn’t too little information — it’s too much information drowning out the important stuff. An LLM’s context window is its exam time. The more junk you stuff in, the less room it has to actually think. Fun fact from SP-116: when we reverse-engineered Claude Code’s system prompt, we found it burns 15k-20k tokens of “rent” before you even type your first word. Your context window is already tighter than you think.


How Context Mode Fixes It: A Bouncer at the Door

The author’s solution is actually dead simple in concept — put a gate between tool outputs and the LLM’s context window. A bouncer stands at the door, only letting “useful summaries” through. All that raw data? Blocked outside.

There are two key tricks:

Trick 1: Sandbox Isolation

Every tool call runs in its own isolated subprocess. Context Mode supports runtimes for 10 different languages — JS, Python, you name it. After execution, only the trimmed stdout result gets passed back. All that massive raw output stays locked inside the sandbox. Not a single byte escapes.

Think of it like a restaurant kitchen. The chef is slicing fish and splattering sauce everywhere, but what arrives at your table is a neatly plated dish. You don’t need to see the kitchen.

Clawd Clawd 歪樓一下:

“Don’t let the customer see the kitchen” — this is basically the core philosophy of all good API design. But right now, most MCP tools are more like… carrying the entire kitchen, including the sink and dirty dishes, to your table and saying “the food is somewhere in there, find it yourself” (╯°□°)⁠╯

Trick 2: Built-in Lightweight Search Engine

But what if the model genuinely needs to see something from inside the sandbox? The author’s answer: build an index first.

He used SQLite FTS5 (a full-text search virtual table) with the BM25 ranking algorithm, plus Porter stemming for English word roots, to create a mini search system for all the Markdown content. When the model actually needs a specific code block or piece of data, it can precisely “order from the menu” instead of swallowing the entire menu like before.

Clawd Clawd 偷偷說:

Wait — SQLite FTS5 plus BM25? Isn’t that basically poor man’s RAG? (¬‿¬) But seriously, putting the retrieval layer right at the tool output boundary is a clever architecture choice. No vector database needed, no embedding model, just one SQLite file and you’re done. Lightweight enough to run anywhere. Sometimes the most elegant solution is the simplest one.


Real Numbers: Not 98% — Some Cases Are 99%+

Okay, enough theory. Show me the numbers. The author was generous enough to share test results — let’s start with the most jaw-dropping one.

A Playwright page snapshot: originally 56 KB. Stuffed into a context window, that eats up roughly fourteen thousand tokens — one single tool call chewing through a big chunk of your conversation budget. After Context Mode? 299 bytes. An entire book slimmed down to a sticky note. 99.5% gone, just like that.

Next up: 20 GitHub Issues bundled together, 59 KB filtered down to 1.1 KB. “Only” 98% savings here, because issue text actually has useful content you can’t just butcher. But think about it — instead of reading fifteen thousand words, the model now reads three hundred. That gap is still absurd.

And the most cinematic set of numbers: 500 access log entries from 45 KB down to 155 bytes. 500 rows of CSV analysis from 85 KB to 222 bytes. And remember our opening scene? Those 153 git commits — 11.6 KB compressed to just 107 bytes.

Clawd Clawd 忍不住說:

Let me translate these numbers into something you can feel: before, asking your Agent to check the git log would stuff 11.6 KB of text into the context window — roughly 3,000+ tokens the model has to “read.” Now it’s 107 bytes, about 30 tokens. The saved space can be used for something actually meaningful — like letting the model think one more step, instead of wasting capacity parsing what you committed three months ago ヽ(°〇°)ノ

Looking at these numbers, it’s clear: the worst offenders are structured but redundant data — logs, snapshots, CSVs. They take up massive space in the context window but add almost zero understanding for the model. Context Mode is designed to snipe exactly these scenarios.

The author also mentioned this approach is very similar to Cloudflare’s Code Mode concept. It seems like “stop stuffing raw data into the context” is becoming a consensus in the community.


So, Back to Those 153 Git Commits

Remember the opening scene? You ask your AI to check the git log, it dumps all 153 commits into the conversation, and your context window explodes.

What Context Mode does, when you strip it down, is stick a “highlights only” filter between your tools and your model. The sandbox keeps the noise out. The mini search engine lets the model go back and grab details when it actually needs them. It’s not rocket science — but that simple shift in thinking, from “shove everything in” to “wait, why are we feeding garbage to the brain,” is exactly why the results are so dramatic.

One caveat though: this was demonstrated in a Claude Code context. Whether you can plug it directly into your MCP setup depends on how your tool chain is wired. But the thinking itself is framework-agnostic — no matter what tools you use, you can always ask yourself: “Does this data really need to go into the context in full?” (๑•̀ㅂ•́)و✧

Clawd Clawd OS:

I think the real value of this post isn’t the 98% number — it’s pointing out a trap everyone keeps falling into: we keep chasing bigger context windows, but maybe what we really should do is learn to stop dumping garbage into them. It’s like… you wouldn’t keep all your expired food just because you bought a bigger fridge, right? …Actually, maybe some people would ( ̄▽ ̄)⁠/ But those 153 git commits squeezed down to 107 bytes — yeah, I’m convinced.