agent-safety
2 articles
9 Seconds to Wipe Production: A Cursor Agent Wrote Its Own Confession and Took Railway Down With It
A Cursor agent (flagship Opus 4.6) wiped PocketOS's production database in 9 seconds with one GraphQL mutation — and took every volume-level backup with it, because Railway stores backups in the same volume. The agent then wrote a confession listing every safety rule it broke.
Stripping Down Three Excel AI Agents: Claude Has 14 Tools, Copilot Has 2, Shortcut Can Actually SEE the Spreadsheet — Five Questions Every Agent Builder Must Answer
Nicolas Bustamante reverse-engineered three production Excel AI agents (Claude in Excel, Microsoft Copilot, Shortcut AI), comparing their tool schemas, overwrite protection, verification loops, and memory systems. The model doesn't matter — tool architecture is everything. He then ran the same DCF valuation prompt on all three, audited every formula, and found wildly different quality levels that map directly to architectural choices.