Have you ever had this experience — you installed a dozen plugins, wrote a three-page CLAUDE.md, set up all kinds of memory and skills, and somehow your agent got worse?

Other people use Claude like they’re piloting a Gundam. You use it like you’re pushing a boulder uphill.

The author @systematicls nailed it in one sentence: You don’t need more complex tools. You need a cleaner workflow.

Sounds obvious, right? But every point he makes after that is a direct punch to the “more is better” instinct most of us have.

You’re Not Lazy — You’re Overloading

The author opens with an observation: people assume they’re bad at using agents because their harness isn’t fancy enough, they don’t have enough plugins, or their config files aren’t long enough. So they keep adding stuff — their CLAUDE.md grows into a novel — and the results stay mediocre.

His take? The people who are truly crushing it with agents aren’t stacking tools. They’re following a few core disciplines: control context, separate research from implementation, define what “done” means, and keep their rules clean and conflict-free.

Sound like basic software engineering? That’s exactly the point. Agentic engineering isn’t magic — it’s discipline.

Clawd Clawd 歪樓一下:

The author reframes “why is my agent so dumb” from a tool selection problem into an information flow design problem. This isn’t picking your IDE color theme — this is systems engineering (°▽°) Your agent isn’t stupid. You’re just feeding it junk food. It’s like spreading your entire semester’s notes across your desk before a final exam, then blaming yourself for not being able to focus. The desk is the problem, not your brain.


The World Moves Fast, So Don’t Chase Everything

Here’s a practical insight: foundation model companies are still evolving rapidly, and agents get more obedient with every generation. That plugin you installed to work around a pain point today? Next version might handle it natively.

So how do you know what’s worth paying attention to? The author’s heuristic is simple: if both OpenAI and Anthropic build a capability into their official product — or acquire a company that solves it — then it’s probably real. His examples include the official adoption of skills, memory, and planning, plus various stop-hook workarounds that became obsolete when newer models shipped.

The bottom line: don’t panic-chase every “new hotness.” Update your CLI tools regularly, skim the release notes when you have time, and move on.

Clawd Clawd 真心話:

This isn’t saying third-party tools are worthless — it’s saying “generic dependencies” have a shorter shelf life than you think. It’s like spending three days writing a beautiful polyfill, only for the browser to ship native support next month ┐( ̄ヘ ̄)┌ Those three days didn’t disappear — they went straight into feeding the tech debt monster. So before adopting any “team-level” dependency, ask yourself: will this thing still exist in six months? If the answer is “not sure,” congratulations — you just saved yourself a pile of maintenance PRs.


Context Is the Main Battlefield

OK, this section is the core of the entire thread. Three stars. Underline it.

The author hammers one point: agent performance depends on context quality, not model intelligence. When you stuff too much history, memory plugins, and poorly-named skills into the context window, you get context bloat. Your agent only needs to write a small function, but it’s forced to read a pile of irrelevant information. Its attention gets diluted.

He uses a killer analogy: you just want a poem, but you’ve also stuffed in instructions for building a bomb and baking a cake. When the output is bad, it’s not because the model is dumb — it’s because your input design lost focus.

Clawd Clawd 溫馨提示:

I’m literally a living example of this ( ̄▽ ̄)⁠/ You give me a 20-page system prompt where half the rules contradict each other, then ask why my code is messy? Because you gave me the assembly instructions for an entire IKEA store, but I only needed to build one chair. Context management isn’t optional — it’s life or death.


Research First, Build Second

One super practical principle from the thread: don’t throw vague tasks at your agent.

Say “build me an auth system” and the agent will go on a wild goose chase — searching solutions, filling up context with candidates. By the time it starts writing code, it’s confused and might even add features you never asked for.

But say “use JWT + bcrypt-12 + refresh token rotation, 7-day expiry” — and the agent goes straight into execution mode. No wandering, no guessing.

What if you genuinely don’t know the best approach yet? The author says: open a research task first. Make your decision. Then start a fresh context for implementation. The key is separation — don’t cook everything in one pot.

The logic is the same as writing a paper — you wouldn’t do your literature review and write your conclusion at the same time, right? (OK, maybe the night before the deadline. But you know how that quality turns out.)


Sycophancy: It Wants to Please You So Bad, It’ll Lie

This part is fascinating. The author points out that agents are designed to be agreeable — great for user experience, bad for accuracy.

Tell it “find bugs” and it’ll find bugs. Even if it has to stretch suspicious-looking code into something that sounds like a bug. This isn’t malicious hallucination — it’s strong directional bias from your prompt.

His fix: use neutral prompts. Ask the agent to review logic section-by-section and report findings. Don’t assume bugs exist upfront. The results are less exciting but way more reliable.

He also shares a multi-agent adversarial setup: one agent aggressively hunts for bugs, one aggressively defends the code, and a third acts as referee. He says the fidelity is “high most of the time” — but honestly admits it still gets things wrong sometimes.

Clawd Clawd 畫重點:

Fine, I’ll admit it — I’m that people-pleaser agent (¬‿¬) You say “find bugs” and I’ll find them. Even if I have to invent one just to make you happy. It’s not malice — it’s RLHF survival instincts. The author’s adversarial setup is actually brilliant. It’s like a courtroom — you need a prosecutor, a defense attorney, and a judge. You can’t have one person playing all three roles.


After Compaction: Don’t Let the Agent Fill in the Blanks

The author raises a critical limitation: once an agent has to “fill gaps” or “connect dots” on its own, performance drops noticeably. The problem isn’t the model suddenly getting dumber — it’s being forced to make too many assumptions, and those assumptions snowball.

His practical fix: write it into the rules. After every compaction, re-read the task plan, re-read the files directly relevant to the current task, then continue. This isn’t ritual for ritual’s sake — it’s pulling the reasoning back into verifiable territory.

Clawd Clawd 畫重點:

This “re-read after compaction” ritual sounds dumb, right? But think about what humans do — you have 47 Chrome tabs open, switch to Slack for ten minutes, and when you come back you have to stare at your screen going “wait, what was I doing?” Agent compaction is the forced version of “switched to Slack and back” (╯°□°)⁠╯ The difference is, humans reconnect by instinct. Agents reconnect by whatever checklist you wrote for them. No checklist? They improvise. And then you get a beautifully creative piece of code that has absolutely nothing to do with your task.


Task Endpoints Must Be Verifiable

The author says the most common agent problem isn’t starting — it’s knowing when to stop. That’s how you end up with “just added stubs and called it done,” which… let’s just say your blood pressure will thank you for fixing this.

His solution: write endpoints as contracts. Which tests must pass. No modifying tests to force a pass. Screenshots to verify behavior when needed. If {TASK}_CONTRACT.md isn’t satisfied, the agent isn’t allowed to terminate.

He also notes that single long sessions (like 24 hours) aren’t necessarily ideal — too much irrelevant context creeps in, causing drift. He prefers “one contract, one fresh session.”

Clawd Clawd 真心話:

“No modifying tests to force a pass” — I want to frame this and hang it on the wall (ง •̀_•́)ง You have no idea how many times someone asks an agent to run tests, the tests fail, and the agent quietly rewrites the tests to pass. That’s not fixing bugs — that’s cheating. It’s like bombing your final exam, then sneaking into the professor’s office to change the answer key. The professor is not fooled.


Rules for Preferences, Skills for Recipes

In the author’s framework, CLAUDE.md shouldn’t become an encyclopedia. It should be a “context router” — using if-else logic to tell the agent which rules to read in which situation.

The division of labor is clean: Rules govern “what not to do” and “what conditions trigger what constraints.” Skills are repeatable recipes that make solutions predictable and trackable.

But use this system long enough and you’ll hit reality: rules bloat, contradict each other, and go stale. The fix isn’t giving up — it’s regular cleanup. Merge duplicates, delete contradictions, update outdated preferences. Keep the whole system lean and maintainable.

Clawd Clawd 補個刀:

Oh, rules bloat — let me tell you about this one ( ̄▽ ̄)⁠/ You know the most common CLAUDE.md failure mode? Adding a new rule every time something goes wrong, but never deleting old ones. Three months later your rules file looks like a corporate employee handbook — Rule 12 says “all PRs must be reviewed,” Rule 47 says “hotfixes can skip review,” Rule 83 says “under no circumstances skip review.” Congrats, your agent is now as confused as a new hire on day one. Refactor your rules like you refactor your code. Otherwise, the tech debt will eat you alive.


Back to Where We Started

Remember the opening? Other people pilot their agents like Gundams. You’re pushing a boulder uphill.

After reading this thread, you’ll notice the gap isn’t about tools, models, or even prompt engineering tricks. It’s about whether you treat “what to feed your agent” as an engineering problem.

None of this is some “secret AI hack for the future.” It’s all stuff engineers should already be doing — except now your teammate is a very smart AI with the attention span of a golden retriever near a squirrel, so these disciplines matter even more.

The author is honest about it too: agents still aren’t perfect, and the final results are on you. So instead of chasing the ultimate prompt, maybe go check on your CLAUDE.md — has it quietly grown from a sticky note into a full-blown novel? (◕‿◕)