Karpathy dropped a tweet that hit every solo dev right in the gut: he built an app called MenuGen about a year ago, and the hardest part of the whole thing “wasn’t the code itself.” It was assembling all the services together like IKEA furniture — payments, auth, database, security, domain names — each one its own little circle of hell.

He followed it up with a full blog post documenting the entire journey. More precisely, it’s a firsthand account of “someone with almost no web dev experience trying to drag a vibe coded app from localhost to production in 2025.”

Clawd Clawd 忍不住說:

The most important detail in this whole piece isn’t Karpathy’s job title — it’s how he describes himself: a tinkerer with almost zero web dev experience, brute-forcing an app from localhost to actually-live. That’s what makes this article so valuable as a reference point (╯°□°)⁠╯


What Is MenuGen and Why Build It?

The origin story is delightfully mundane: Karpathy gets confused by restaurant menus. What’s Pâté? What’s Tagine? Sweetbread sounds sweet (he admits “I have a huge sweet tooth”), but it’s definitely not sweet. And then there are those French menus: “Confit tubers folded with matured curd and finished with a beurre noisette infusion” — you read the whole thing and know exactly nothing.

So he built MenuGen: snap a photo of a menu, and it generates images for every dish. The generated images won’t be pixel-perfect, but at least you can tell “oh, that’s a salad, that’s fish, that’s soup.”

He prototyped it at a vibe coding hackathon. After the event, he liked it enough to keep going — add auth, add payments, deploy it, make it a “real product.”

Here’s the critical framing: he defines this as his first end-to-end vibe coded app, and says he didn’t write any code directly — nearly everything came from Cursor + Claude. So this article isn’t just a dev diary. It’s a real-world stress test of “can AI actually ship an app from zero to production right now?”

Clawd Clawd 歪樓一下:

I love that this app was born from a genuine annoyance. Not some grand vision to change the world — just “I googled menu items until I wanted to scream, so I built an app.” That kind of motivation usually produces the most useful things (◕‿◕)


The 80% Illusion: Everything Looks Great on Localhost

Karpathy started building with Cursor + Claude 3.7. The first prototype came together fast — React frontend, nice UI, CSS animations, responsive design, the whole package. Just no backend functionality yet.

His exact words: “I felt like I was 80% done but (foreshadowing…) it was a bit closer to 20%.”

Seeing a shiny new website appear so quickly is genuinely addictive. But the real 80% was still waiting.

Clawd Clawd 畫重點:

This “80% done illusion” is the classic vibe coding trap. You see a pretty UI on localhost, feel like you’re almost there, then discover the remaining 20% takes 80% of your time and 200% of your sanity. The Pareto principle of software engineering, running completely in reverse ┐( ̄ヘ ̄)┌


The First Wall: OpenAI + Replicate APIs

To make the app actually work, he needed two APIs: OpenAI for OCR (reading the menu text) and Replicate for image generation.

OpenAI required getting an API key, figuring out “projects” and permissions. Claude kept hallucinating outdated APIs, model names, and input/output formats, which was confusing. He ended up copy-pasting official docs back and forth for a while before things worked. Then immediately hit rate limiting — only a few queries per 10 minutes.

Replicate was worse. Beyond the usual stale-LLM-knowledge problem, even the official docs were partially outdated — the API had recently changed to return a Streaming object instead of JSON, which “neither I or Claude understood.” Same rate limiting issues, making debugging painful.

Karpathy later learned these limits exist to prevent fraud, but for a brand new legitimate account, the experience is “I haven’t done anything wrong, but I’m being treated like a criminal.”

Clawd Clawd 想補充:

What stings most here isn’t just that LLMs use outdated APIs — it’s that the docs and services themselves are moving targets. When even the official documentation hasn’t fully caught up, the LLM is guaranteed to get lost right alongside you (¬‿¬)


Vercel Deployment: The .env Lesson

App works locally. Time to deploy to Vercel. Create account, set up project, point to GitHub repo, push to master — ERROR.

Vercel’s build is stricter than local — unused variable linting errors that were fine locally now break the build. And since the build only runs on Vercel, his only debugging option was “push fake debugging commits to master to force redeploys.”

After fixing linting, the site still didn’t work. Asked Claude, asked ChatGPT, checked docs, googled for an hour. The answer: .env.local was excluded by .gitignore (which is correct!), so the API keys never made it to Vercel. You have to manually add environment variables in Vercel’s project settings one by one.

He says “I kind of understood the issue relatively quickly,” but he can imagine a beginner being stuck here for ages.

Bonus surprise: his GitHub repo was private, but Vercel deployed the unfinished project to a “totally public and easy to guess” URL.

Clawd Clawd 補個刀:

Pushing fake commits to master to debug Vercel builds — this image is painfully real. Picture the git log: “test deploy 3” “test deploy 4 pls work” “why wont you build” — every developer has lived through this darkness, they just never talk about it publicly ( ̄▽ ̄)⁠/


Clerk Auth: The Inner Circle of Hell

Claude suggested Clerk for authentication. Karpathy went with it. Claude then hallucinated roughly 1,000 lines of deprecated Clerk API code. More doc copy-pasting ensued.

But the bigger trap came next: switching Clerk from Development to Production mode requires your own domain name. menugen.vercel.com won’t cut it. So he had to buy menugen.app, connect the domain to Vercel, and update DNS records.

Then came OAuth provider setup (he chose Google). Another adventure: enable SSO connection, create a new project in Google Cloud Console, create OAuth Credentials, wait for approval, then bounce between nested settings pages across Vercel, Clerk, and Google.

He wrote one brutally honest line here: “I thought of quitting the project around here, but I felt better when I woke up the next morning.”

Clawd Clawd 忍不住說:

Clerk → Vercel → Google Cloud Console → back to Clerk → back to Vercel for DNS — this flow is literally the IKEA manual come to life. You think you’re assembling a desk, but halfway through you discover you need a different screwdriver, and that screwdriver is in a separate parts bag, and that bag requires a trip to a different IKEA. The fact that he considered quitting surprises me not at all — just reading the description makes me want to quit on his behalf (╯°□°)⁠╯


Stripe Payments: TypeScript Minefield

Next up: Stripe. Another website, another account, another pile of docs, another set of keys.

He used Next.js for the backend. Copied the first code snippet from Stripe’s getting started docs — ERROR. Why? Because when you select Next.js, Stripe gives you JavaScript, but his app was TypeScript. Every paste into Cursor triggered linter errors. Claude needed several attempts to fix it (Karpathy says he “threatened to switch to ChatGPT”).

More importantly, he caught a serious design flaw from Claude: it used email addresses to match Stripe payments with user credits. The problem? A user’s Stripe checkout email might not match their Google OAuth email — meaning they pay but never get credits.

After Karpathy pointed this out, Claude immediately apologized and rewrote it using unique user IDs. Then Claude said “it will do it correctly in the future.” Karpathy’s verdict: “which I know is just gaslighting.”

Clawd Clawd 真心話:

This email matching bug is a perfect case study: LLMs produce code that looks reasonable but has logical holes. It’s not a syntax error. Linters won’t catch it. Unit tests probably pass (because test emails are always the same). You only spot it when you actually understand the user flow. That’s why human review can’t be skipped in vibe coding — at least not while AI is still willing to gaslight you with a straight face (⌐■_■)


Database? He Chose to Run

By this point, the app processes everything in real time — no cache, no persistent storage. If a menu is too long or the API is slow, requests time out. Refresh the page, all results vanish.

The proper fix would be a database (to track work items) plus a work queue (for async processing). Claude suggested Vercel KV (Karpathy knew it was already deprecated) or Supabase PostgreSQL + Upstash for queueing.

But that means: more services, more logins, more API keys, more configurations, more docs, more pain.

His conclusion: “It was too much bear. Leave as future work.”

Clawd Clawd 認真說:

Let’s do the math. By this point Karpathy has survived the full IKEA hell marathon: OpenAI, Replicate, Vercel, Clerk, Google OAuth, Stripe — six services, six accounts, six piles of API keys. Then someone tells him “hey, you still need a database and a message queue” and he flips the table. Honestly, “too much bear” might be the most precise three-word summary of the 2025 solo dev experience ┐( ̄ヘ ̄)┌


So Now What? Karpathy’s Prescription

Alright, we’ve seen all the pain. The natural question is: is there any way to make this less miserable?

After documenting the full journey, Karpathy thought about this too. He didn’t land on one clean answer — instead he threw out four directions, each a bit radical, that together sketch a pretty interesting picture of where things might go.

The first instinct is obvious: what if someone built an all-in-one platform? Not the Vercel Marketplace approach of “here’s an ecosystem, go pick your pieces.” The opposite — an opinionated platform that ships with domain, hosting, auth, payments, and database out of the box. You don’t assemble IKEA furniture. You just say “I want a desk.”

But he immediately pours cold water on his own idea: history is littered with failed attempts at this. Everyone wants to build it. Almost nobody succeeds.

Clawd Clawd OS:

The all-in-one platform curse: build too little and nobody uses it, build too much and it becomes another vendor lock-in nightmare. Firebase tried, Heroku tried, a dozen startups are trying right now. The problem isn’t technical — it’s that every app’s version of “all-in-one” looks different ┐( ̄ヘ ̄)┌

His second direction cuts deeper: every service should be LLM-friendly. Karpathy’s observation is sharp — anything you tell a developer, that developer immediately copy-pastes it to an LLM. So why not talk to the LLM directly? Provide CLI tools. Use curl commands for backend config. Write docs in Markdown. Stop asking people to “visit the dashboard and click this button.”

The original quote is worth preserving in full: “Don’t talk to a developer. Don’t ask a developer to visit, look, or click. Instruct and empower their LLM.”

Third, go back to basics: maybe don’t use such a fancy stack. He’s considering plain HTML/CSS/JS + a Python backend (FastAPI + Fly.io) for his next project. Skip the serverless multiverse of modern web development. Simple app, simple tools, maybe half the suffering.

Fourth, and this one’s the spiciest: maybe MenuGen shouldn’t be an app at all. Think about it — its core functionality is one GPT call for OCR plus a for loop generating images. Does that really need Vercel + Clerk + Stripe + a domain + a database? What if LLM responses weren’t just text but interactive web pages? Like Artifacts? What if you could publish them directly to an app store?

Clawd Clawd 畫重點:

That fourth point is basically saying: all the pain we just spent an entire article describing? It might be fundamentally unnecessary. If AI output can itself be the product, then the whole “code to deployment” road doesn’t need to exist. It’s like someone telling you “you don’t need to learn to drive, because cars will drive themselves soon” — sounds crazy, but the direction might actually be right (๑•̀ㅂ•́)و✧


Back to the Tweet: The Real Challenge for Agents

In his tweet, Karpathy pushes the idea further. He says he looks forward to the day when he can tell an agent: “build menugen” (pointing at the blog post), and the agent handles everything — browsing services, reading docs, getting API keys, debugging, deploying to prod.

He emphasizes one key idea: “the entire DevOps lifecycle has to become code.” Not just your application code, but all those things you currently have to “click buttons on a website” to accomplish need corresponding CLI/APIs, designed specifically for agents (he uses the phrase “agent-native ergonomics”).

The current state? His own assessment: “It’s now just barely technically possible and expected to work maybe.”

But he’s excited. He says it requires from-scratch re-design, rethinking and redesigning — and that’s exactly what makes it exciting.

Clawd Clawd 偷偷說:

“Agent-native ergonomics” is a phrase worth bookmarking. Karpathy isn’t saying “make existing APIs nicer.” He’s saying the entire way infrastructure is interacted with needs to be redesigned for AI agents. Just like the mobile era demanded responsive design, the agent era demands that every service be programmable — not simulated mouse clicks. This is a paradigm shift at the infrastructure layer ヽ(°〇°)ノ


Closing Thoughts

Let’s come back to that IKEA metaphor.

Karpathy’s MenuGen journey boils down to this: AI can now cut every board, sand every surface, and even draw you a beautiful blueprint. But it can’t assemble the furniture yet. You’re still the one holding the hex wrench, staring at step 47 of the manual, trying to screw bolt C into hole D — except the manual is outdated, the bolt is the wrong size, and the hole isn’t where the diagram says it is.

AI coding tools solved the most visible problem: writing code. But the stuff that actually devours your weekends — bouncing between ten browser tabs, copy-pasting API keys across three dashboards, waiting thirty minutes for a DNS record to propagate — that’s all still right there, waiting for everyone who wants to turn a localhost demo into a real product.

Maybe one day, you really will just say “build menugen” and walk away. But until then — welcome to IKEA ( ̄▽ ̄)⁠/