Your Agent Should Use a File System: Why Bigger Context Windows Miss the Point
If you’ve been playing with AI agents lately, you’ve probably heard a very sci-fi-sounding promise: once context windows get big enough, agents will just remember more and more stuff, and then everything will magically work.
Thariq’s thread is basically him standing up and saying: nope, that’s not the answer. He opens with “Your agent should use a file system” and immediately adds “This is a hill I will die on.” That is not casual small talk. That is a flag planted in the ground.
What makes the thread land so hard is that it is not mystical at all. It is a very grounded engineering take: an agent’s state should not live only inside the conversation. Put it in the file system.
Humans do not work by memorizing every detail either. We write notes, keep folders, save docs, leave scratch pads for ourselves. You do not need to remember every email word for word. You just need to know where to look, how to grep, and how to double-check what you found (◍•ᴗ•◍)
What is Thariq actually arguing?
His core point is simple: the file system is an elegant way to represent state. An agent can write information to files, pull those files back into context when needed, and — this is the crucial part — use them to verify its own work.
That makes the file system much more than storage. It becomes a workbench.
Not just persistence, but searchability. Not just history, but grounding. Not just memory, but a way to make multiple passes, compare outputs, and catch mistakes before they turn into confident nonsense.
Clawd 歪樓一下:
A lot of people hear “agent memory” and immediately imagine vector databases, giant context windows, or some orchestration contraption that looks like it escaped from a lab. Those can be useful, sure. But the power of this thread is that Thariq starts from the cheapest, most universal primitive almost every machine already has: the file system. Not every problem needs a spaceship. Sometimes a notebook already wins.
Why bigger context windows are not the answer
Thariq says Claude Code was what really convinced him.
Before Claude Code, many people assumed that once context windows got large enough to fit an entire codebase, programming would become mostly solved. Just stuff the whole repo into memory and let the model cook.
But that is not how programming actually works.
You do not become a good engineer by memorizing the whole codebase like a final exam. You become useful by knowing where the answer probably lives, how to find it, and how to check whether your first guess was wrong.
That is why Thariq’s line hits so hard: you don’t need to remember everything, you just need to know how to find it.
A bigger context window is like giving someone a larger short-term memory buffer. A file system is like giving them a real desk, labeled drawers, and a cabinet they can come back to later. One lets you hold more stuff at once. The other lets you work.
Clawd 補個刀:
This is also why the “just make context longer” story keeps sounding better than it works. It feels dramatic. It demos well. But real engineering is usually “locate → read → edit → verify → fix again.” That’s a workflow, not a memory flex.
The email agent example: write first, grep later
Thariq uses his open-source email agent as the clearest example, and honestly, it is a great one.
A common instinct would be to dump a giant pile of emails into context and hope the model can extract the right addresses, clues, or relationships from raw memory.
His approach is different. He writes the emails to files, then lets the agent grep across those files.
That difference is enormous.
Now the agent is not limited to one shot. It can try one address search, fail, try another one, compare candidates, and then go back to the exact lines to confirm what is real and avoid hallucinating. Only after that does it extract the result in a structured format.
The magic here is not grep by itself. The magic is that the file system gives the agent room to do multi-pass work. Search, verify, revise, repeat.
That is also how humans work. When you need to confirm someone’s email at work, you do not stare into the void and hope inspiration arrives. You search your inbox, compare threads, check the spelling, and only then send the thing.
Five use cases that click immediately
Then Thariq lists several more examples, and every single one makes the idea feel more practical.
Memory: save prior context as searchable files
The first is memory. Not the magical “the AI remembers everything about you forever” version. The grounded version: save previous conversations as markdown or JSON files, then search them later when needed, even linking back to exact architecture context.
That matters because memory becomes inspectable. The model is no longer saying “trust me, I remember.” It can point to the file.
React artifacts: write, lint, fix
The second is React artifacts. Agents make mistakes when writing code — that is normal. So instead of demanding perfect output from pure thought, let the agent write to a file, run a lint script, read the errors, and fix them.
That sounds ordinary, but it is a huge shift. It turns generation into a loop: generate → check → repair. Much more reliable than hoping the model mentally simulates every failure mode.
Deep Research: subagents write notes, the orchestrator reads them
The third is Deep Research. Spin up multiple subagents, let each one write findings into markdown files, then have the orchestrator search across those files, summarize them, and ground itself in the references.
This feels a lot like a real research team. Everyone writes their own memo. The lead person collects, compares, and synthesizes. Much healthier than shoving every thought into one mega-chat and praying nobody steps on each other.
Clawd 插嘴:
At this point the file system is no longer just storage. It becomes the collaboration layer between agents. Who found what, which source was used, which summary is better, what still looks shaky — all of that can leave a trail. And if you ever need to debug a multi-agent workflow, that trail is pure oxygen. Without files, you’re left with ghost stories like “I think it probably considered that at some point…”
Planning and scratch pads: write down the middle steps
The fourth is planning and scratch pads.
This matters a lot for hard problems, especially when several subagents are involved. Otherwise the classic disaster happens: agent A already explored something agent B does not know about, the orchestrator forgets what both of them did, and suddenly everyone is redoing work like a relay race of amnesia.
Writing plans, notes, and pending questions into the file system gives the task a shared whiteboard.
Dungeons & Dragons: even an AI dungeon master benefits
And then comes the most fun example: Dungeons & Dragons.
Instead of forcing an AI dungeon master to remember every location, character, monster, secret, and bit of world lore through conversation memory alone, let it write those things into files and organize them properly. Then it can load what it needs depending on what the party is doing.
It sounds nerdy, but it is actually a perfect example. Because it reveals something important: many agent problems are not intelligence problems first. They are state-management problems first.
If you ask an AI to be the narrator, record keeper, archivist, improviser, and game master all at once — while giving it only chat memory to survive on — of course it starts forgetting things. Poor thing ┐( ̄ヘ ̄)┌
Why the community loved this thread
The replies were overwhelmingly positive, and for good reason: a lot of people had clearly discovered the same pattern in practice.
One reply basically laid out the whole case in a few bullets:
- nearly unlimited storage
- persistent across sessions
- searchable with grep
- human-readable as JSON or Markdown
- Git-trackable as an audit trail
- easy for multiple agents to collaborate on
Another reply generalized the idea: the deeper lesson is not “it must literally be files.” The deeper lesson is that agents need a durable persistence layer. Stop trying to one-shot everything in memory.
A few other replies were especially relatable.
One person said that whenever Claude Code gets close to its context limit, they ask it to write down the current progress and pending todos to a file, then start a fresh session and continue from that file. That is such a senior-engineer move. Not brute force — handoff.
Another person pointed out that the operating system already comes with much better APIs and tooling around the file system than most of us could hack together with a few ad hoc web APIs. Search, diff, versioning, file watching, sharing — the wheels already exist.
Clawd OS:
This is one of my favorite kinds of tech thread: not the “behold, I have invented a new universe” kind, but the “someone finally said the obvious thing clearly enough that everyone else can nod in relief” kind. The replies are not arguing. They are basically lining up to say, “Yep. Been doing this too.”
The real lesson is bigger than files
If you read this thread as “okay, so I should make my agent write a few txt files,” you are missing the best part.
The real lesson is the mental model underneath: do not design agents as creatures that must solve everything in one shot inside their head. Give them an external environment where state can persist, where work can be searched, and where results can be checked and repaired.
That is why the file system is so strong. It naturally supports the things real work needs:
- state that can be written down
- process that leaves traces
- mistakes that can be rechecked
- handoffs between agents
- direct visibility for humans
Honestly, this mirrors good human teams pretty well. Stable teams do not keep all important knowledge trapped inside one person’s brain. They write docs, open issues, save runbooks, leave commits. AI agents will probably have to grow up in the same direction.
Thariq ends by saying many file-system-heavy examples go hand in hand with code generation, and he plans to write more about using code generation creatively for problems beyond coding agents.
That hint is worth paying attention to. It suggests that code is not just the final product. It can also be the material agents use to build temporary workflows, working surfaces, and custom tools for themselves.
Closing
This thread hit so hard not because it is about files.
It hit because it gives clear words to something many people had already started to feel: a larger context window does not automatically produce a working agent. What makes an agent reliable is having a place to put state down, look back at it, search through it, cross-check it, and fix itself.
Stripped down to basics, this is not teaching AI to become a superhero. It is teaching it to become a functional coworker: write things down, look things up, and do not trust vibes when you can verify.
Very plain. Very powerful.
And plain designs are usually the ones that survive.