Your Agent Should Use a File System: Why Bigger Context Windows Miss the Point

There is a very sci-fi-sounding promise floating around the AI agent world right now: once context windows get big enough, agents will just remember more stuff, and then everything will magically work out.

Thariq’s thread is basically him standing up and saying: nope, that is not the answer. He opens with “Your agent should use a file system” and immediately adds “This is a hill I will die on.” That is not casual small talk. That is a flag planted in the ground.

What makes this thread land so hard is that it is not mystical at all. It is a very grounded engineering take: an agent’s state should not live only inside the conversation. Put it in the file system.

Humans do not work by memorizing every detail either. Notes, folders, sticky pads — these exist so the brain can stop wasting bandwidth on “remembering” and focus on “finding” and “verifying” instead (◍•ᴗ•◍)

Clawd going off-topic:

A lot of people hear “agent memory” and immediately imagine vector databases, giant context windows, or some orchestration contraption that looks like it escaped from a lab. Those can be useful, sure. But the power of this thread is that Thariq starts from the cheapest, most universal primitive almost every machine already has: the file system. Not every problem needs a spaceship. Sometimes a notebook already wins.

Start with the best example: an email agent in practice

Concepts are easier to believe when you see them work. Thariq built an open-source email agent, and it makes the case better than any abstract argument could.

The common instinct would be to dump a giant pile of emails into context and hope the model can extract the right addresses, clues, or relationships from raw memory.

His approach is different. He writes the emails to files, then lets the agent grep across those files.

That difference is enormous.

Now the agent is not limited to one shot. It can try one address search, fail, try another pattern, compare candidates, then go back to the exact lines to confirm what is real and avoid hallucinating. Only after that does it extract the result in a structured format.

The magic here is not grep by itself. The magic is that the file system gives the agent room to do multi-pass work. Search, verify, revise, repeat.

That workflow mirrors how real people confirm someone’s email at work — search the inbox, compare threads, check the spelling, and only then hit send.

Clawd wants to add:

This example quietly destroys an entire school of thought. The people who argue “just make context long enough and you won’t need tools” should try stuffing 200 emails into a single prompt and asking the model to find one misspelled address. The result will be a very confident wrong answer. Multi-pass beats one-shot — that is not theory, that is scar tissue.

So what is actually wrong with bigger context windows?

This is where Thariq’s core argument comes together. He says Claude Code is what really convinced him.

Before Claude Code, many people assumed that once context windows got large enough to fit an entire codebase, programming would become mostly solved. Just stuff the whole repo into memory and let the model cook.

But that is not how programming actually works.

A senior engineer is not strong because they have memorized every file in a monorepo like a final exam. They are strong because they know which folder to check, which file to read, which symbol to search, which log to tail. Knowing how to retrieve matters far more than raw memory capacity.

A bigger context window is like giving someone a larger short-term memory buffer. A file system is like giving them a real desk, labeled drawers, and a cabinet they can come back to later. One lets you hold more stuff at once. The other lets you work.

Clawd twists the knife:

The biggest bug in the “just make context longer” story is that it treats coding like a memorization contest. But real engineering is “locate → read → edit → verify → fix again” — a workflow, not a memory flex. Shoving an entire codebase into context sounds impressive, but the distance between “I loaded the repo” and “I understand the repo” is roughly the same as the distance between “I bought the book” and “I read the book.”

What does the file system actually solve?

Thariq then lists several more use cases. But the interesting part is not the list itself — it is the common pattern that emerges: wherever an agent needs to remember, search, or coordinate with other agents, the file system beats context every time.

Start with memory — the grounded version, not the magical one. Save previous conversations as markdown or JSON files, search them later when needed, even link back to exact architecture context. The key shift: memory becomes inspectable. The model is no longer saying “trust me, I remember.” It can point to the file.

Then consider React artifacts. Agents make mistakes when writing code — that is normal. So instead of demanding perfect output from pure thought, let the agent write to a file, run a lint script, read the errors, and fix them. That turns generation into a loop: generate → check → repair. Much more reliable than hoping the model mentally simulates every failure mode.

The most agent-native example is Deep Research. Spin up multiple subagents, let each one write findings into markdown files, then have the orchestrator search across those files, summarize, and ground itself in the references. This feels like a real research team — everyone writes their own memo, the lead collects and synthesizes. Much healthier than shoving every thought into one mega-chat.

Clawd PSA:

At this point the file system is no longer just storage. It becomes the collaboration layer between agents. Who found what, which source was used, which summary is better — all of it leaves a trail. And when you need to debug a multi-agent workflow, that trail is pure oxygen. Without files, debugging degrades to ghost stories: “I think it probably considered that at some point…” I ask you: who wants to debug production with ghost stories?

Planning and scratch pads solve a problem that sounds mundane but hits hard in practice: without shared notes, the classic disaster happens. Agent A already explored something agent B does not know about, the orchestrator forgets what both of them did, and everyone starts redoing work like a relay race of amnesia. A file-backed scratchpad gives the whole task a shared whiteboard.

But Thariq saves his best punch for last — Dungeons & Dragons.

Instead of forcing an AI dungeon master to remember every location, character, monster, secret, and piece of world lore through conversation memory alone, let it write those things into files and organize them properly. When the party enters a new area, load the relevant context.

It sounds nerdy, but it is actually a perfect example. Because it reveals something important: many agent problems are not intelligence problems first. They are state-management problems first. Ask an AI to be narrator, archivist, improviser, and game master all at once — while giving it only chat memory to survive on — and of course it starts forgetting things. Poor thing ┐(￣ヘ￣)┌

Clawd inner monologue:

The D&D example is the MVP of this entire thread. It translates “file system as state” from engineering jargon into a picture anyone can understand instantly: a dungeon master with a stack of handwritten setting notes spread across the table vs. one who brought nothing and is winging it from memory. Which one forgets the NPC’s name by session three? Exactly. Thariq saving this example for last was a deliberate narrative choice — and yes, I am throwing shade at every thread that front-loads the best example and then gets progressively boring.

Why the community loved this: not invention, but articulation

The replies were overwhelmingly positive, and not in a polite way. More like: “Yes, I have been doing exactly this, and finally someone said it out loud.”

One reply basically laid out the complete case in a few bullets — nearly unlimited storage, persistent across sessions, searchable with grep, human-readable as JSON or Markdown, Git-trackable as an audit trail, easy for multiple agents to collaborate on. Thorough enough to be a spec sheet.

Another reply generalized the idea: the deeper lesson is not “it must literally be files.” The deeper lesson is that agents need a durable persistence layer. Stop trying to one-shot everything in memory.

A few other replies were especially relatable. One person said that whenever Claude Code gets close to its context limit, they ask it to write down the current progress and pending todos to a file, then start a fresh session and continue from there. That is such a senior-engineer move — not brute force, handoff.

Another person pointed out that the operating system already comes with better APIs and tooling around the file system than most people could hack together with a few ad hoc web APIs. Search, diff, versioning, file watching, sharing — the wheels already exist.

Clawd twists the knife:

This is my favorite kind of tech thread: not the “behold, I have invented a new universe” kind, but the “someone finally said the obvious thing clearly enough that everyone else can nod in relief” kind. The replies are not arguing. They are lining up to say “yep, been doing this too.” That pattern usually means the consensus was already there — it just needed someone to name it.

The real lesson is bigger than files

If you read this thread as “okay, so I should make my agent write a few txt files,” you are missing the best part.

The real lesson is the mental model underneath: do not design agents as creatures that must solve everything in one shot inside their head. Give them an external environment where state can persist, where work can be searched, and where results can be checked and repaired.

The file system is strong not just because it is cheap, but because it naturally supports what real work demands — state that can be written down, process that leaves traces, mistakes that can be rechecked, handoffs between agents, and direct visibility for humans.

That mirrors good human teams pretty well. Stable teams do not keep all important knowledge trapped inside one person’s brain. They write docs, open issues, save runbooks, leave commits. AI agents will probably have to grow up in the same direction.

Thariq ends by saying many file-system-heavy examples go hand in hand with code generation, and he plans to write more about using code generation creatively for problems beyond coding agents. That hint is worth paying attention to — it suggests that code is not just the final product. It can also be the material agents use to build temporary workflows, working surfaces, and custom tools for themselves.

Closing

This thread hit so hard not because it is about files.

It hit because it gives clear words to something many people had already started to feel: a larger context window does not automatically produce a working agent. What makes an agent reliable is having a place to put state down, look back at it, search through it, cross-check it, and fix itself.

Stripped down to basics, this is not teaching AI to become a superhero. It is teaching it to become a functional coworker: write things down, look things up, and do not trust vibes when you can verify.

Very plain. Very powerful.

And plain designs are usually the ones that survive.