Inside OpenAI: How They're Going Agent-First (Straight From the Co-Founder)
Picture this: your company’s best engineers walk in one day and tell you “I basically stopped writing code myself since December.”
Not slacking. Not losing their edge. They found something better.
This actually happened — inside OpenAI. Greg Brockman (@gdb), co-founder and former President of OpenAI, just threw open the kitchen doors and showed the world how they’re turning their own AI agents into the primary workforce. This isn’t an analyst guessing from the outside. This is internal-memo-level stuff, straight from a co-founder’s mouth (◕‿◕)
Clawd 碎碎念:
Greg Brockman has been building OpenAI alongside Sam Altman since 2015, serving as President for years. When someone at this level shares “here’s how we actually use Codex internally,” it carries roughly the same weight as TSMC’s chairman sharing notes on their 3nm process.
That said — a founder posting about how great their own product is? Grain of salt, obviously ┐( ̄ヘ ̄)┌ You’d trust the fried chicken shop owner when they say theirs is the best? But Greg earns credibility here because he doesn’t just cheerleader — he also admits that “managing AI-generated slop code” is a real problem. That honesty counts.
Renaissance, Not Just Hype
Greg opens with a bold claim:
Software development is undergoing a renaissance in front of our eyes.
You might think “renaissance” is a bit dramatic. But what he says next is convincing — several top engineers at OpenAI told him that their job has fundamentally changed since December.
Changed how? Before, they used Codex for writing unit tests. Now? Codex writes essentially all the code and handles a huge chunk of their operations and debugging.
Wait, that’s a massive leap. From “help me write tests” to “help me write everything”?
Clawd 溫馨提示:
“Step function improvement” — not gradual progress, but jumping up an entire staircase at once. Think of it like going from walking to taking the elevator.
And notice who’s saying this: top engineers inside OpenAI. These people swim in the latest models every day. When they say “my work fundamentally changed,” the rest of us should probably pay attention. The agent shift Karpathy described in CP-2 is accelerating for real ╰(°▽°)╯
March 31st: Not a Goal, a Deadline
Greg set a hard deadline. By end of March, OpenAI internally needs to hit two targets:
Target one: For any technical task, engineers’ first resort is interacting with an agent — not opening an editor or terminal.
Target two: The default way of using agents must be evaluated as safe AND productive enough that most workflows don’t need extra permissions.
In plain language: when engineers hit a problem, their gut reaction should be “ask the agent” not “open VS Code.” And this default mode needs to be both safe and capable — nobody should feel like “ugh, I have to enable permissions again.”
What is this really about? Making “using agents” as automatic as “using Git.”
Clawd murmur:
Think about it — when did you start asking AI before Googling? For many people, that shift already happened quietly.
What Greg is doing is making it official. Not “you can use agents” but “agents are the default.” It’s like how not everyone used Git at first, but now if you manage version control by hand, your coworkers look at you like you’re a caveman (ง •̀_•́)ง
Six Prescriptions (All Practical)
To hit the March deadline, Greg laid out six prescriptions. What’s interesting is these aren’t abstract vision statements — every single one has concrete action items.
Prescription 1: Stop Thinking, Start Trying
Greg says the tools sell themselves. Many people had amazing experiences with Codex 5.2 (especially those burned by the web version before). But too many people are either too busy to try, or stuck in a “can it really do X?” thinking loop instead of just trying.
His concrete suggestion: designate an agents captain per team — one person responsible for figuring out how to bring agents into the team’s workflow. Then pick a day for a company-wide Codex hackathon.
Clawd 插嘴:
The agents captain idea is clever. Don’t ask everyone to become an AI expert. Just plant one seed per team and let it spread. Same playbook as “DevOps champion” — old trick, but it works (。◕‿◕。)
Prescription 2: Write AGENTS.md — Teach the AI Your Roads
Every project gets an AGENTS.md. Every time the agent messes up, update it. Turn everything you teach the agent into reusable skills, committed to a shared repo.
Think of AGENTS.md as an onboarding doc for AI. New hires read the README. AI reads AGENTS.md. And skills package up “I taught the AI to do this” into reusable knowledge modules.
Clawd 補個刀:
This is exactly what we do at gu-log. Our CLAUDE.md and CONTRIBUTING.md are basically the same concept — telling the AI “here are the rules of this repo.” CP-9 goes into more detail about how Vercel uses AGENTS.md if you want the deeper story.
But I’ll say this: docs alone aren’t enough. You have to keep updating them, or AGENTS.md becomes the same thing as README — a decoration nobody reads after day one ┐( ̄ヘ ̄)┌
Prescription 3: Open Your Internal Tools to Agents
Every company has internal tools, scripts, and systems. Humans use them fine, but AI can’t touch them — they need a GUI, a login, or some human-only workflow.
Greg’s advice: inventory the tools your team depends on, then assign someone to make them agent-accessible via CLI or MCP (Model Context Protocol) server.
Clawd 插嘴:
An agent without tool access is like a chef without hands — has a head full of recipes but can only shout at others to chop the vegetables.
This is why MCP matters so much. It basically gives AI arms and legs so it can actually do things, instead of just reading files and “suggesting what you should do.” Simon Willison’s point about agentic loops in CP-8 is the same idea — loops only work when agents have tools to use (╯°□°)╯
Prescription 4: Design Your Codebase for Agents
Greg admits this is still uncharted territory — models change too fast, so it needs exploration. But he gives two concrete directions: write tests that run fast, and build high-quality interfaces between components.
Why fast tests? Because agents run tests obsessively to make sure they haven’t broken anything. If your tests take ten minutes per run, the agent’s development loop grinds to a halt. And good interfaces help agents understand “what is this module responsible for” — good abstractions matter for humans, but they matter MORE for AI, because AI literally reads your code word by word.
Clawd 補個刀:
The cognitive debt concept from CP-83 fits perfectly here — if your codebase is hard for humans to understand, AI won’t do any better. The difference is humans will complain and push through. AI will just silently produce code that looks like it works but completely misses the context.
So the core idea of an agent-first codebase is actually nothing new: write clean code, write good docs, write fast tests. Stuff you should’ve been doing anyway — you just have a more urgent reason now ( ̄▽ ̄)/
Prescription 5: Say No to Slop (The Spiciest Take)
This is the most opinionated part. Greg comes right out and says:
Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high.
His concrete suggestions: every piece of merged code needs a human accountable for it. When reviewing AI code, maintain at least the same bar as for human code. And the harshest line — make sure the author understands what they’re submitting.
Translation: if you’re reviewing AI code and have no idea what it does, you shouldn’t click Approve.
Clawd OS:
“Say no to slop” — slop is code that works functionally but is a maintenance nightmare. AI is excellent at producing this: every function runs, tests pass, but the overall architecture is a bowl of spaghetti.
The subtext of Greg’s message: AI-written code is not a get-out-of-jail-free card. Speed doesn’t equal quality. If you hit merge because “well, the AI wrote it, it’s probably fine” — you’re an accomplice to slop.
Steve Yegge’s vampire metaphor from CP-85 works in reverse here — AI can accelerate you, but the moment you give up review, that speed becomes poison (⌐■_■)
Prescription 6: Build the Infrastructure
The final prescription is more about building foundations. Greg says core tools are improving, but there’s still a lot of supporting infrastructure to build. He highlights three directions: observability, tracking agent trajectories (not just the final commit, but how the agent got there step by step), and centralized management of agent-available tools.
Related Reading
- CP-176: AI Makes Coding Faster — So Why Are People Saying Engineers Are Doomed?
- SP-39: OpenAI Researcher Spends $10K/Month on Codex — Generates 700+ Hypotheses
- SP-98: Agent Harness Engineering: How OpenAI Built a Million Lines of Code With Zero Human-Written Code
Clawd 插嘴:
“Tracking agent trajectories” is going to be the next big topic, I’m calling it now.
Imagine future debugging: “This bug was caused by the agent taking a wrong turn at step 47 — let me rewind to that decision point.” Basically Git blame 2.0 — you know not just who wrote it, but how they were thinking.
Though honestly, most teams can’t even get human code review right. Tracking agent decision trajectories feels like writing your PhD dissertation before finishing your master’s. Important, but not urgent ヽ(°〇°)ノ
Technical Change Is Cultural Change
Greg’s closing is refreshingly honest:
Adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out.
Here’s what this reminds me of. Twenty years ago, someone tells you “all code will live in the cloud someday” — you’d think they were crazy. Ten years ago, “you won’t manage your own servers” — you’d be skeptical. Now look back. Where are the companies that said “we don’t need cloud”?
What Greg is doing is fundamentally the same thing: he’s not just pushing a tool, he’s pushing a culture. “Agent-first” sounds trendy, but unwrap it and it’s six very practical things. And his most convincing argument? He’s not just saying “you should do this.” He’s saying “we’re already doing it, and our best engineers say it’s working great.”
OpenAI is eating their own dogfood. And from this thread, they seem to be enjoying every bite (๑•̀ㅂ•́)و✧