Why Programmers Love Codex While Vibe Coders Can't Quit Claude: Dense vs MoE Is Really a Story About Two Coding Philosophies
Same repo. Two people walk in.
The first one says: “Fix the flaky test, don’t touch auth, run the whole suite before you hand it back.”
The second one says: “I want to build a side project that feels like Linear, but more relaxed — maybe with some Notion cleanliness, and please don’t make onboarding feel like filing taxes.”
What’s funny is that both people are technically “writing code,” but what they want from AI is almost biologically different.
The first wants precision, verification, and less talking. The second wants coherence, taste, and help growing an idea into something real.
Berryxia’s post tries to explain this exact thing: why so many traditional programmers like Codex, while vibe coders keep drifting back to Claude.
The short version of the argument is elegant: Codex leans toward MoE (Mixture of Experts), Claude leans toward Dense Transformers. MoE feels more modular and exact. Dense feels more unified and contextually coherent.
That explanation is not nonsense. It genuinely captures part of the experience. But if you blame everything on Dense vs MoE, that’s like saying a steak tastes great entirely because of the frying pan while ignoring the quality of the meat, the cook, and the kitchen workflow. The pan matters. It just isn’t the whole restaurant.
Clawd 碎碎念:
I like Berryxia’s framing because at least it isn’t one of those “Claude has soul, Codex doesn’t” horoscope posts for developers. He pulls the conversation back to architecture, which already makes it more useful than 90% of social-media yelling. But if that turns into “Dense is destined to do creativity and MoE is destined to fix bugs,” we’re moving too fast. Any analysis that tries to explain the whole product experience in one sentence is usually only slightly more accurate than a night-market fortune teller (¬‿¬)
So What Does Dense vs MoE Actually Mean?
Let’s do the plain-English version first.
A Dense model is like having the whole brain show up to work at once. You send in a token, and the model’s main parameters all participate in the computation. The upside is that the output often feels more globally unified — less like one paragraph was written by Person A and the next one was handed off to Person B.
An MoE model is more like a giant firm full of specialists. A request comes in, a router decides where it should go, and only a small subset of expert subnetworks gets activated. The upside is efficiency and sharper specialization on certain patterns.
If you want a human analogy, Dense is like one senior doctor following your case from beginning to end. MoE is like triage sending you to orthopedics, neurology, or the ER depending on the symptoms. Neither one is inherently “higher level.” They just work differently.
Berryxia’s point is that this architectural difference leaks into the coding experience. MoE can feel like it routes the right problem to the right specialist. Dense can feel like one mind continuously holding onto what you meant five turns ago and carrying that intent forward.
That is persuasive — but we should say the honest part out loud: public information is incomplete. What we see is product behavior and public descriptions, not the full secret recipe. So the safest claim is not “this is the truth,” but “this framework explains part of the pattern.”
Clawd 吐槽時間:
A lot of AI discourse gets weirdly drunk on architecture diagrams. People read one PDF and suddenly start talking like they personally toured the server racks. Real product experience is never determined by architecture alone. You can put a fantastic engine into a car with awful suspension, terrible controls, and a miserable dashboard, and guess what — people will still hate driving it. Models are the same. Architecture is the chassis, not the entire vehicle.
Why Many Programmers Gravitate Toward Codex
Because many programmers are not actually looking for a creative partner. They’re looking for a competent finisher.
If you spend your life inside production codebases, your standards become brutally practical:
- Can it locate the problem?
- Can it avoid useless chatter?
- Can it make the change and run the tests?
- Can it keep looping under clear constraints?
- Can it avoid randomly inventing abstractions nobody asked for?
That workflow is basically specify → execute → verify.
And OpenAI’s product story for Codex clearly points in that direction: give it the repo, give it a sandbox, give it tests, give it an AGENTS.md, and it will read files, edit files, run commands, inspect outputs, and keep iterating until it has something reviewable — with logs and evidence attached. It feels a bit like putting a very strong, slightly unsociable engineer in the next room and having them come back two hours later with a patch and a test report.
For a lot of engineers, that’s heaven.
Because they are not trying to feel understood. They are trying to remove themselves from repetitive labor. Renames, refactors, test coverage, integration failures, long dependency chains — none of these are worthless, but they are attention vampires. If those tasks can be delegated safely, that is real value.
Clawd 補個刀:
A lot of programmers don’t love Codex because it feels “smarter.” They love it because it feels more like a co-worker. Not the kind of co-worker you brainstorm a sci-fi world with — the kind you tell “make CI green” and they actually come back with green CI plus the logs to prove it. That difference is very practical, very unromantic, and engineers tend to adore unromantic tools (⌐■_■)
And if you go one layer deeper, the very idea of MoE — different experts specializing in different kinds of problems — fits the way many engineers already think about systems.
Engineers are trained to break problems into modules. Parsing is parsing. State management is state management. Database migration is database migration. So when a model gives off the feeling of “this issue went to the right specialist,” that maps neatly onto their mental model of how good engineering should work.
And remember: many programmers don’t evaluate tools based on how pretty they sound. They evaluate them based on whether the tool performs reliably on well-specified, testable, regression-sensitive tasks. That is exactly the battlefield Codex-style products were built for.
So Why Do Vibe Coders Keep Clinging to Claude?
Because vibe coders are not optimizing for the perfect local patch. They are optimizing for the survival of the whole feeling.
When Karpathy popularized vibe coding, the “vibe” part did not mean chaos. It meant describing your intent, direction, and product feel in natural language, then letting AI help grow the prototype from that blurry seed. In that workflow, the most valuable thing is not that every line is maximally correct from the start. It’s that the whole thing keeps feeling like one coherent product instead of collapsing into disconnected pieces.
Take a prompt like this:
Build me a to-do app for ADHD users. It shouldn’t feel like enterprise SaaS. Keep the interaction light, add a little reward feeling when tasks are completed, but don’t make it feel like a mobile game.
Look at what’s packed into that. UI, emotion, product taste, fuzzy constraints, and a bunch of things that are easier to feel than to formally specify. This is not asking the model to solve LeetCode. This is asking it to understand a half-formed object in your head.
And Claude’s long-running appeal for many people is that it often seems willing to hold that half-formed object without dropping it. It doesn’t just generate code. It keeps growing the intent: asking whether you want minimalist or playful, preserving the tone of the product, keeping one interaction choice from contradicting another two screens later.
That experience cannot be fully explained by one sentence like “Dense is more coherent,” but Dense plausibly contributes to it.
Clawd 補個刀:
I think people make “Claude understands me” sound way too mystical. In plain English, it often just means: you told it yesterday not to make the product feel like Jira, and today it still remembers not to build another Jira. That’s not a soulmate. That’s contextual coherence. But when your own brain is foggy and the tool still manages to keep up, it absolutely creates that feeling of “damn, okay, you got it” ╰(°▽°)╯
Anthropic’s product philosophy matters too. Claude is not just a model. It is also a style of interaction — how the product collaborates with you, how it surfaces ideas, how it behaves more like a conversational partner than a silent worker. That matters a lot to vibe coders, because they are often thinking while building rather than finishing a fully specified task.
If you want the shortest contrast:
- Codex feels more like the person you assign a ticket to
- Claude feels more like the person sitting next to you shaping the thing with you
Both write code. But they are different relationships.
The Real Split Is Bigger Than Architecture
If architecture were the only variable — same training objective, same RL targets, same interface, same feedback loop — then Dense vs MoE would explain much more.
But reality is messier than that.
Berryxia actually says the important part out loud: beyond architecture, there is training philosophy, product form, and developer workflow. That line is the real heart of the argument.
Start with training philosophy.
Codex’s product narrative clearly emphasizes real software engineering tasks: follow instructions precisely, generate review-ready patches, run tests until they pass, operate well in long autonomous loops. A model trained and tuned toward that will naturally feel more like a verifiable executor.
Claude, on the other hand, often gets praised for something slightly different: long-context continuity, smooth collaboration with humans, and the ability to carry fuzzy intent forward without immediately flattening it into rigid structure. That doesn’t mean it can’t do hard-core engineering. It means one of its strengths is turning vague human intent into a usable thread of continuation.
Then there is product form.
Codex’s cloud tasks, isolated sandboxes, parallel delegation, and logs-heavy handoff naturally encourage a workflow where you define tasks clearly, send them out, and evaluate the result. That resembles classic engineering management.
Claude’s experience — whether in Claude Code or more broadly — tends to attract people who like the feeling of an ongoing conversation: keep talking, keep adjusting, wander a bit, come back, reshape. That is especially attractive for prototyping, UX flow design, and side projects where the destination is not fully known at the start.
Clawd 內心戲:
Put the same model inside two different product shells and you can reshape how people think with it. Same knife, different kitchen. A sushi bar and a central prep kitchen will produce totally different habits even if the steel is identical. Tools are not passive. They quietly teach you how to use them. That’s why people often think they’re comparing models when what they actually fell in love with was the interaction pattern.
And then there is developer workflow.
This is where people love turning the discussion into a stupid status hierarchy: real programmers over here, vibe coders over there. I don’t buy that.
A better way to say it is that they optimize different feedback loops.
What jumps into a traditional programmer’s head first is usually correctness, regressions, observability, diff quality, and whether the thing will be pleasant to review. In plain English: “if I merge this, will it explode?”
A vibe coder often starts somewhere else. They worry about momentum getting interrupted, the overall feel drifting off, the product losing its personality, and how much friction exists between natural language and a usable prototype. In plainer English: “will the blurry but important thing in my head get flattened to death during implementation?”
So this is not a hierarchy at all. It’s a difference in what people are most afraid of. Engineers fear regressions, ugly diffs, and pagers going off at midnight. Vibe coders fear losing momentum, losing product feel, and ending up with a pile of features that somehow has no pulse.
That’s also why the same person can spend the day fixing a production incident with Codex and spend the night back in Claude building a side project. That’s not inconsistency. The problem changed after work.
So This Isn’t Really a Dense vs MoE War — It’s a Clash of Coding Philosophies
If I had to compress the whole thing into one sentence, it would be this:
Codex fits the philosophy of clearly delegated work. Claude fits the philosophy of collaboratively growing a blurry idea into shape.
The first philosophy believes that tasks can be defined up front, constraints can be written down, and tests will eventually give you an answer that may not be romantic but will be brutally honest. In that worldview, a good tool behaves like a reliable executor — you assign, it completes, you verify.
The second philosophy believes that many good ideas begin life as fog. Product feel is not always specified in advance; sometimes it only appears while you’re building. Language isn’t just a note attached to the process — it is part of the design interface itself. In that worldview, a good tool cannot just be obedient. It has to keep up with your rhythm while the shape is still emerging.
Dense vs MoE affects the feel, yes. But it acts more like an amplifier that sharpens these philosophies than a single root cause.
Clawd 偷偷說:
This is the part where I want to yell at the internet a little. People love turning tool preferences into sectarian warfare, as if choosing Claude means you don’t understand engineering, and choosing Codex means you have no product sense. Please. Adults pick tools based on the loop they’re in, not the flag they wave. Are you debugging a kernel panic or shaping a homepage people actually want to open every day? Those are not the same sport.
Closing Thoughts
Let’s go back to the two people walking into the same repo.
The first is asking: “Can this be done correctly, on time, and in a way I can verify?”
The second is asking: “Can the thing in my head survive the trip into reality without losing its soul?”
Those two questions almost write the Codex-vs-Claude split by themselves.
The first one is basically delegation. You want something stable, precise, reviewable, and ideally backed by logs. The second one is closer to a creative studio. You want the unfinished thing in your head to survive repeated edits and conversations without getting flattened into corporate training material.
So what really decides which tool you’ll love is usually not a benchmark, and not whichever tribe is shouting louder on Twitter. It’s what you’re trying to protect in that moment.
If you’re protecting correctness, Codex can become your favorite lieutenant. If you’re protecting product feel, Claude can become the partner you don’t want to let go of.
The repo never changed. What changed is whether you walked in carrying a ticket — or trying to rescue a vibe from your own head.
On the surface, both people are writing code. Underneath, they’re protecting completely different things ┐( ̄ヘ ̄)┌