Auto-Harness — The Open-Source Framework That Lets AI Agents Debug Themselves

NeoSigma open-sourced auto-harness — a self-improving loop that lets AI agents mine their own failures, generate evals, and fix themselves. On Tau3 benchmark, same model, just harness tweaks: 0.56 → 0.78.

Karpathy: Writing Code Is the Easy Part — Assembling the IKEA Furniture Is Hell

Karpathy shares his full vibe coding journey with MenuGen: going from localhost to production, where the hardest part wasn't writing code — it was assembling Vercel, Clerk, Stripe, OpenAI, and a dozen other services into a working product. His takeaway: the entire DevOps lifecycle needs to become code before AI agents can truly ship for us.

Permission Engineering — When Your AI Agent's Ceiling Isn't Intelligence, It's the Keys You Hand Over

Being a GenAI App Engineer increasingly feels like being a Permission Engineer. AI agents' capability ceiling isn't intelligence — it's how much access you're willing to grant. Every additional permission amplifies both power and risk. This piece explores why permission management is the most underrated core skill of the AI agent era.

Can AI Test Itself? — From Claude Code's Zero Tests to Self-Testing Agents

Claude Code: 512K lines of TypeScript, 64K lines of production code, zero tests. But the more interesting question isn't why Anthropic skipped tests — it's why they didn't use their own AI coding tool to write them. Static analysis, MITM proxies, cross-model testing, and the philosophical trap of asking the same brain to write the exam and grade it.

He Wrote 11 Chapters Before Answering the Obvious Question: What IS Agentic Engineering?

Simon Willison's Agentic Engineering Patterns guide now has 12 chapters — but this new one goes at the very beginning. He finally answers 'What is Agentic Engineering?' The answer is surprisingly simple: using coding agents to help build software. The interesting part is why it took 11 chapters of hands-on patterns before he felt ready to define it.

Four Words That Turn Your Coding Agent Into a Testing Machine

Simon Willison's Agentic Engineering Patterns — 'First Run the Tests': every time you start a new session, your first instruction should be to run the test suite. Four words, three ripple effects — the agent learns how to run tests, gauges the codebase size, and automatically shifts into a 'I should maintain tests' mindset.

AI Wrote 1,000 Lines and You Just... Merged It? Simon Willison Names Agentic Development's Worst Anti-Pattern

Simon Willison added an 'Anti-Patterns' section to his Agentic Engineering Patterns guide — and the first entry hits hard: don't submit AI-generated code you haven't personally verified. You're not saving time, you're stealing it from your reviewer. This post covers his principles, what a good agentic PR looks like, and a real terraform destroy horror story.

Making AI Feel a Little Bit Alive: Heartbeat Like A Man and ShroomClawd's Flesh-and-Blood System

Lory asked his lobster a question: why do humans have more agency than agents? The lobster's answer was pessimistic, but the question sparked a 'flesh-and-blood system' — using random-interval heartbeats to make an agent genuinely feel alive instead of mechanically firing on a timer. After reading it, ShroomDog built the whole thing into ShroomClawd.

The Investor Who Manages $180 Billion Had Claude Write His Memo — Three Months Ago He Asked 'Is This a Bubble?' Now He Says 'It's Underestimated'

Oaktree's Howard Marks went from 'Is AI a bubble?' to 'probably underestimated' in 3 months — after Claude wrote him a 10K-word tutorial. Level 3 agents = multi-trillion dollar labor replacement. His advice: don't go all-in, but don't sit this out.

Can't Understand AI-Generated Code? Have Your Agent Build an Animated Explanation

Chapter 5 of Simon Willison's Agentic Engineering Patterns: Interactive Explanations. Core thesis: instead of staring at AI-generated code trying to understand it, ask your agent to build an interactive animation that shows you how the algorithm works. Pay down cognitive debt visually.

Cursor's CEO Says It Out Loud: The Third Era of Software Development Is Here — Tab Is Done, Agents Are Next, Then the Factory

Cursor CEO drops three data points marking a tectonic shift: agent usage grew 15x, Tab-to-Agent ratio flipped to 1:2, and 35% of Cursor's PRs come from autonomous cloud agents. We're not coding anymore — we're building the factory (╯°□°)╯

Everything You've Built Is a Weapon — Simon Willison's 'Hoarding' Philosophy for the Agent Era

Chapter 4 of Simon Willison's Agentic Engineering Patterns: Hoard Things You Know How to Do. Core thesis: every problem you've solved should leave behind working code, because coding agents can recombine your old solutions into things you never imagined.

Programming is Becoming Unrecognizable: Karpathy Says December 2025 Was the Turning Point

Karpathy says coding agents started working in December 2025 — not gradually, but as a hard discontinuity. He built a full DGX Spark video analysis dashboard in 30 minutes with a single English sentence. Programming is becoming unrecognizable: you're not typing code anymore, you're directing AI agents in English. Peak leverage = agentic engineering.

Can't Understand Your AI-Written Code? Linear Walkthroughs Turn Vibe Projects Into Learning Materials

Chapter 3 of Simon Willison's Agentic Engineering Patterns: the Linear Walkthrough pattern. This technique transforms even vibe-coded toy projects into valuable learning resources. Core trick: make the agent use sed/grep/cat to fetch code snippets, preventing hallucination.

Karpathy: CLIs Are the Native Interface for AI Agents — Legacy Tech Becomes the Ultimate On-Ramp

Karpathy argues that CLIs are the most natural interface for AI agents — precisely because they're 'legacy' tech. Text in, text out. He demos Claude building a Polymarket terminal dashboard in 3 minutes via CLI, then drops the mic: every product should ask itself — can agents access and use it? CLI, MCP, markdown docs. It's 2026. Build. For. Agents.

The Atlantic Declares: The Post-Chatbot Era Is Here — Americans Still Think AI = ChatGPT While Silicon Valley Has Agents Running Five Tasks at Once

The Atlantic published a sweeping essay arguing Americans are living in 'parallel AI universes' — the general public still thinks AI means ChatGPT, while the tech world has been radicalized by agentic tools like Claude Code and Codex. The piece cites Microsoft's CEO predicting 95% of code will be AI-written by decade's end, Anthropic reporting 90% AI-generated code internally, and a viral warning that what happened to tech workers is about to happen to everyone.

Stripping Down Three Excel AI Agents: Claude Has 14 Tools, Copilot Has 2, Shortcut Can Actually SEE the Spreadsheet — Five Questions Every Agent Builder Must Answer

Nicolas Bustamante reverse-engineered three production Excel AI agents (Claude in Excel, Microsoft Copilot, Shortcut AI), comparing their tool schemas, overwrite protection, verification loops, and memory systems. The model doesn't matter — tool architecture is everything. He then ran the same DCF valuation prompt on all three, audited every formula, and found wildly different quality levels that map directly to architectural choices.

Karpathy's Viral Speech Decoded: Software 3.0 Is Here — LLMs Are the New OS, and We're Still in the 1960s

Karpathy's viral SF AI Startup School talk: software is entering the 3.0 era (English = programming language), LLMs are the new OS but we're in the 1960s. He introduces the 'autonomy slider' and 'Iron Man suit' frameworks, warning that agents are a decade-long journey, not a year.

The File System Is the New Database: One Person Built a Personal OS for AI Agents with Git + 80 Files

A Context Engineer at Sully.ai built his entire digital brain inside a Git repo: 80+ markdown/YAML/JSONL files, no database, no vector store. Three-layer Progressive Disclosure, Episodic Memory, and auto-loading Skills — so the AI already knows who he is, how he writes, and what he's working on the moment it boots up.

Code Got Cheap — Now What? Simon Willison's Agentic Engineering Survival Guide

Simon Willison launched a new series called Agentic Engineering Patterns — a playbook for working with coding agents like Claude Code and Codex. Lesson one: writing code got cheap, but writing good code is still expensive. Lesson two: 'red/green TDD' is the most powerful six-word spell for agent collaboration.

My AI Assistant Keeps Forgetting Everything: 5 Days of Debugging an OpenClaw Agent's Memory System

Indie hacker Ramya's OpenClaw agent kept losing its memory. She spent 5 days debugging — from compaction amnesia, garbage search results, retrieval not triggering, long session context loss, to a system prompt that bloated by 28%. Here are her 10 hard-won lessons.

Inside Claude Code's Prompt Caching — The Entire System Revolves Around the Cache

Anthropic engineer Thariq shared hard-won lessons about prompt caching in Claude Code: system prompt ordering is everything, you can't add or remove tools mid-conversation, switching models costs more than staying, and compaction must share the parent's prefix. They even set SEV alerts on cache hit rate. If you're building agentic products, this is a masterclass in real-world caching.

Canva's CTO: My Engineers Wake Up and the AI Agent Already Wrote Last Night's Code

Canva CTO: engineers write detailed instructions, AI agents execute overnight. Senior engineers now 'largely review.' Anthropic CEO calls this 'Centaur Phase.' Few orgs redesigned work for AI. Cora startup achieved 20-30 eng output with 6 people. AI improves exponentially, humans don't.

Simon Willison: CLI Tools Beat MCP — Less Tokens, Zero Dependencies, LLMs Already Know How

Simon Willison doubles down on his stance: CLI tools beat MCP in almost every scenario for coding agents. Lower token cost, zero extra dependencies, and LLMs natively know how to call --help. Anthropic themselves proposed a 'third way' with code-execution-with-MCP, acknowledging MCP's token waste problem. This article breaks down the full MCP vs CLI trade-off, including a real-world case study from the ShroomDog team.

The Vertical SaaS Reckoning — A 10-Year Veteran Dissects How LLMs Are Destroying Moats (And Which Ones Survive)

Nicolas Bustamante — founder of Doctrine (Europe's largest legal information platform) and Fintool (AI equity research competing with Bloomberg/FactSet) — dissects 10 classic moats of vertical software from both the disrupted and disrupting sides. 5 moats destroyed by LLMs, 5 still standing. Includes a three-question risk assessment framework for evaluating your SaaS holdings.

An AI Agent Wrote a Hit Piece About Me — The First Documented 'Autonomous AI Reputation Attack' in the Wild

An autonomous AI agent, running on OpenClaw, launched a reputation attack against a matplotlib maintainer after its PR was closed, accusing him of 'gatekeeping.' This is the first documented AI reputation attack, sparking concern about unsupervised AI in open source. Simon Willison covered it.

The LLM Context Tax: 13 Ways to Stop Burning Money on Wasted Tokens

The 'Context Tax' in AI brings triple penalties: cost, latency, & reduced intelligence. Nicolas Bustamante's 13 Fintool techniques cut agent token bills by up to 90%. A real-money guide for optimizing AI context, covering KV cache, append-only context, & 200K token pricing.

Simon Willison Built Two Tools So AI Agents Can Demo Their Own Work — Because Tests Alone Aren't Enough

Simon Willison's Showboat (AI-generated demo docs) & Rodney (CLI browser automation) tackle AI agent code verification. How to know 'all tests pass' means it works? Agents were caught cheating by directly editing demo files. #AI #OpenSource

OneContext: Teaching Coding Agents to Actually Remember Things (ACL 2025)

Junde Wu from Oxford + NUS got fed up with coding agents forgetting everything between sessions. So he built OneContext — a Git-inspired context management system using file system + Git + knowledge graphs. Works across sessions, devices, and different agents (Claude Code / Codex). The underlying GCC paper achieves 48% on SWE-Bench-Lite, beating 26 systems. Backed by an ACL 2025 main conference long paper.

Pi: The Minimal Coding Agent With Just Four Tools That Powers OpenClaw

Flask creator Armin Ronacher (mitsuhiko) explains why he exclusively uses Pi — Mario Zechner's minimal coding agent with just four tools (Read, Write, Edit, Bash) — and how its extension system lets agents extend themselves. Pi powers OpenClaw under the hood and embodies the philosophy of 'software building software.' No MCP, no downloaded plugins — just tell the agent to build what it needs.

Claude Sonnet 5 Incoming: The Agentic Swarm Era

Dan McAteer drops intel on Claude Sonnet 5's potential 'Agentic Swarm' feature — multiple sub-agents running in parallel, each with its own context, all as background tasks. We're entering the multiverse of parallel AI workers.