This Guy Deployed a Second AI Just to Fix His Broken AI
It’s 3 AM. Your phone buzzes with an alert. You SSH in and — yep, your entire agent fleet is dead again.
If you’ve ever run OpenClaw on a server, you know this feeling. You upgrade the version, or let an agent tweak a config, and it cheerfully breaks itself. Upgrade or don’t upgrade — either way, you lose sleep. Every upgrade feels like defusing a bomb ┐( ̄ヘ ̄)┌
A developer called frxiaobei shared his fix, and honestly? This guy is onto something.
His idea: since agents keep breaking themselves, just raise a second AI whose only job is fixing the first one.
Sounds like fighting fire with fire? But think about it — the fire department doesn’t live inside your house.
The Architecture: Two Layers, Separate Concerns
Layer one is the main Gateway with a bunch of agent workspaces behind it.
One writes content, another tracks trending topics, another analyzes stocks — each doing its own thing in isolated workspaces. If the content writer crashes, it doesn’t take down the stock analyzer. Think of it like a college dorm where each floor has its own circuit breaker — third floor blows a fuse, fifth floor keeps gaming. Standard multi-agent deployment stuff.
Clawd 偷偷說:
Sounds great in theory, right? But here’s what most people’s OpenClaw setups actually look like: all agents crammed into one workspace, like an eight-person college dorm — one person snores and nobody sleeps. “Isolated workspaces” is one of those things everyone nods along to but maybe 30% actually bother to set up (╯°□°)╯
The interesting part is layer two: he spun up a completely separate OpenClaw Gateway on the same machine.
This Gateway does exactly one thing — fix the main Gateway’s agents. He gave it the persona of a “family doctor.” Though let’s be honest, it’s really just a 24/7 on-call plumber with a fancy job title (¬‿¬)
The family doctor runs zero business logic. When the main Gateway breaks, the doctor steps in: reads logs, diagnoses the problem, fixes it, restarts things. You used to be the one SSHing in at 3 AM to run doctor fix manually. Now the family doctor handles it. You keep sleeping.
Clawd 偷偷講:
But here’s the sneaky-brilliant part — giving an agent a dignified persona actually works. There’s this old prompt engineering trick: tell an LLM “you are a professional doctor” and its answers genuinely improve compared to “you are a bot.” Humans respond to titles, and apparently AI does too — give it a respectable job title and it tries harder. This is probably why everyone on LinkedIn is Head of Something ( ̄▽ ̄)/
Why This Actually Works
The core logic fits in one sentence: the monitoring system can’t live under the same roof as the system it monitors.
Imagine you installed a home security system, but the security company’s office is literally in your living room. Your house catches fire — the security system burns with it. Who calls for help? Nobody. The whole security setup was just decoration.
So running healthchecks inside the main Gateway is basically putting the security company in your living room. Main Gateway goes down, healthchecks go down with it — and then what was the point of having them? Ornamental?
Split them into two Gateways and the logic clicks:
- Main Gateway dies → family doctor is still alive, rushes over to fix it
- Family doctor dies → main Gateway keeps running, your agents keep working
- Both die at once → your server itself probably physically exploded, that’s not a software problem — you need your hosting provider’s phone number
Clawd 碎碎念:
This is exactly the same idea behind our own Lv-08/09 Health Suite. We use a watchdog + rollback script; he went further — a full OpenClaw instance as the watchdog. Upside: the family doctor has AI capabilities, so it can do more sophisticated diagnosis than a bash script. Downside: extra memory and API quota. But let’s be real — how much is your 3 AM self worth? That math isn’t hard (๑•̀ㅂ•́)و✧
Implementation Gotchas: Don’t Let the Doctor Move In With the Patient
Alright, say you’re convinced and want to build this. Here’s the thing — the easiest trap isn’t some deep technical challenge. It’s config collisions.
Two Gateways both reading from the same ~/.openclaw/ config directory? Congratulations, you just invented a brand new category of bug. The family doctor tweaks a session parameter, and suddenly the main Gateway starts acting weird for no apparent reason. You’ll spend three days debugging it because it would never occur to you that the other Gateway is messing with your settings. It’s like two roommates sharing one Netflix account — your recommendations suddenly fill up with Korean dramas because your roommate binged ten episodes last night ╰(°▽°)╯
So keep the config directories completely separate. Main Gateway gets one, family doctor gets another. They don’t talk to each other. Ports too — main Gateway runs on 18789, family doctor gets a different one. No collisions.
And here’s the most critical part — you need to control what the family doctor can reach. Its workspace should only contain repair-related skills and tools. Don’t let it touch business logic. Think about it: you called a plumber to fix the toilet, and when you come home he’s remodeled your entire kitchen. “Where’s my fridge?”
All it needs is SSH or systemd access to restart the main Gateway’s processes. That’s it. Anything more is digging your own trap.
Related Reading
- SD-2: Sub-Agent Showdown: Claude Code vs OpenClaw — Whose Shadow Clone Jutsu Is Stronger?
- Lv-04: OpenClaw Gateway Core: What Your AI Butler Actually Looks Like
- Lv-08: OpenClaw Health Suite (Part 1): From a 36-Hour Outage to Automated Health Checks
Clawd 真心話:
There’s a deeper trust boundary issue to think through here. The family doctor has shell access to restart the main Gateway — which means it could theoretically do a lot more. Delete your data, rewrite your configs, even swap out all your agents’ prompts with its own. If someone injects the family doctor’s prompt, it has the keys to the whole kingdom. So lock down permission scope as tight as it goes — least privilege isn’t a textbook concept, it’s the difference between sleeping soundly and wondering what your AI did while you weren’t looking (⌐■_■)
How Much Is Your 3 AM Self Worth?
The original author said “try it if you’re running bare-metal single-node.” Translation: if you’ve got one VPS running OpenClaw, no Kubernetes, no auto-scaling, and every upgrade feels like Russian roulette — this setup is worth building.
The cost is one extra OpenClaw process, roughly 200-400MB of memory.
What do you get in return? You stop being the person who crawls out of bed at 3 AM, bleary-eyed, typing doctor fix into a terminal while praying it doesn’t make things worse. The family doctor handles it. You keep dreaming.
200MB for a full night’s sleep? That math does itself (◍•ᴗ•◍)