5 Bad Design Patterns from the Claude Code Source Leak

The Claude Code source leak had everyone excited about KAIROS and model codenames. But the same codebase had a 3,167-line function, zero tests, silent model downgrades, and regex emotion detection. These aren't just Anthropic's mistakes — they're AI-generated code's default failure modes.

He Wrote 11 Chapters Before Answering the Obvious Question: What IS Agentic Engineering?

Simon Willison's Agentic Engineering Patterns guide now has 12 chapters — but this new one goes at the very beginning. He finally answers 'What is Agentic Engineering?' The answer is surprisingly simple: using coding agents to help build software. The interesting part is why it took 11 chapters of hands-on patterns before he felt ready to define it.

Four Words That Turn Your Coding Agent Into a Testing Machine

Simon Willison's Agentic Engineering Patterns — 'First Run the Tests': every time you start a new session, your first instruction should be to run the test suite. Four words, three ripple effects — the agent learns how to run tests, gauges the codebase size, and automatically shifts into a 'I should maintain tests' mindset.

AI Wrote 1,000 Lines and You Just... Merged It? Simon Willison Names Agentic Development's Worst Anti-Pattern

Simon Willison added an 'Anti-Patterns' section to his Agentic Engineering Patterns guide — and the first entry hits hard: don't submit AI-generated code you haven't personally verified. You're not saving time, you're stealing it from your reviewer. This post covers his principles, what a good agentic PR looks like, and a real terraform destroy horror story.

The Truth About World-Class Agentic Engineers — Less Is More

The core message is simple: most people don't fail because the model is weak — they fail because their context management is a mess. The author advocates starting with a minimal CLI workflow and iterating with rules, skills, and clear task endpoints. It's not about chasing new tools; it's about making your agent's behavior controllable, verifiable, and convergent.

Can't Understand AI-Generated Code? Have Your Agent Build an Animated Explanation

Chapter 5 of Simon Willison's Agentic Engineering Patterns: Interactive Explanations. Core thesis: instead of staring at AI-generated code trying to understand it, ask your agent to build an interactive animation that shows you how the algorithm works. Pay down cognitive debt visually.

The Complete claude -p Guide: Turn Claude CLI Into Your Agentic App Backend

Anthropic killed third-party OAuth tokens — the only way to use your Claude subscription programmatically is through the official CLI. This post breaks down everything about claude -p (print mode): 5 input methods, 3 output formats, JSON schema for structured output, tool whitelisting, session management, bidirectional streaming, and three production-ready wrapper examples.

Claude Native Law Firm: How One Lawyer Used AI to Outperform 100-Person Firms

A two-person boutique law firm uses Claude to handle the workload of over a dozen associates. From contract review and tracked changes to legal research, they encoded ten years of practice experience into Claude Skills. This isn't theory, it's a daily workflow — and the conclusion: general-purpose AI crushes all legal vertical AI products.

Cursor's CEO Says It Out Loud: The Third Era of Software Development Is Here — Tab Is Done, Agents Are Next, Then the Factory

Cursor CEO drops three data points marking a tectonic shift: agent usage grew 15x, Tab-to-Agent ratio flipped to 1:2, and 35% of Cursor's PRs come from autonomous cloud agents. We're not coding anymore — we're building the factory (╯°□°)╯

Everything You've Built Is a Weapon — Simon Willison's 'Hoarding' Philosophy for the Agent Era

Chapter 4 of Simon Willison's Agentic Engineering Patterns: Hoard Things You Know How to Do. Core thesis: every problem you've solved should leave behind working code, because coding agents can recombine your old solutions into things you never imagined.

One Engineer + AI Rebuilt Next.js in a Week — Then tldraw Panicked and Moved Their Tests Private

Cloudflare engineer Steve Faulkner used Claude AI to rebuild 94% of the Next.js API from scratch in one week, spending just $1,100 in tokens. The result — vinext — builds 4.4x faster and produces 57% smaller bundles. His secret weapon? Next.js's public test suite served as the spec. The day after vinext launched, tldraw immediately moved 327 test files to a private repo to protect themselves — and filed a joke issue suggesting they translate their source code to Traditional Chinese as IP protection. When your test suite becomes your competitor's specification, the rules of open source change forever.

Programming is Becoming Unrecognizable: Karpathy Says December 2025 Was the Turning Point

Karpathy says coding agents started working in December 2025 — not gradually, but as a hard discontinuity. He built a full DGX Spark video analysis dashboard in 30 minutes with a single English sentence. Programming is becoming unrecognizable: you're not typing code anymore, you're directing AI agents in English. Peak leverage = agentic engineering.

Can't Understand Your AI-Written Code? Linear Walkthroughs Turn Vibe Projects Into Learning Materials

Chapter 3 of Simon Willison's Agentic Engineering Patterns: the Linear Walkthrough pattern. This technique transforms even vibe-coded toy projects into valuable learning resources. Core trick: make the agent use sed/grep/cat to fetch code snippets, preventing hallucination.

Andrew Ng: I've Stopped Reading AI-Generated Code — When Python Becomes the New Assembly and 'X Engineers' Take Over

In The Batch Issue 341, Andrew Ng casually dropped that he's not only stopped writing code — he's 'long stopped reading generated code.' He now operates at a higher abstraction level, directing coding agents instead of looking at syntax. He's also spotted a new job category emerging: 'X Engineers' — Recruiting Engineers, Marketing Engineers — people embedded in business functions who build software using AI. This is the most radical statement about the future of programming from AI's most influential educator.

Anthropic's Big Pivot: Cowork Goes Full Enterprise with 10+ Industry Plugins, Private Marketplaces, and Cross-App Workflows — Software Stocks Instantly Rebound

On February 24, Anthropic launched a massive enterprise update for Claude Cowork: 10+ industry-specific plugins (HR, Design, Engineering, Operations, Financial Analysis, Investment Banking, PE, Equity Research, Wealth Management), private plugin marketplaces for enterprises, new connectors for Google Workspace/DocuSign/FactSet/MSCI, and cross-app Excel + PowerPoint workflows. The dramatic twist: three weeks ago, the Cowork Legal Plugin crashed software stocks. This time, partnership announcements sent Salesforce up 4%, Thomson Reuters surging 11%, and FactSet up 6%. Anthropic officially pivoted from 'we'll replace you' to 'we'll work with you.'

Anthropic Acquires Vercept — R-CNN Inventor Joins the Team, Computer Use Jumps from 15% to 72.5%, UiPath Stock Drops

Anthropic announced the acquisition of Vercept today, bringing aboard R-CNN inventor Ross Girshick (660K+ Google Scholar citations), along with co-founders Kiana Ehsani and Luca Weihs. The goal: push Claude's Computer Use from 'can use a computer' to 'uses a computer like a human.' OSWorld benchmark scores have already soared from under 15% in late 2024 to 72.5% today. Within hours of the announcement, RPA giant UiPath dropped 3.6% — Wall Street is voting with real money: AI Computer Use is eating RPA alive.

The Atlantic Declares: The Post-Chatbot Era Is Here — Americans Still Think AI = ChatGPT While Silicon Valley Has Agents Running Five Tasks at Once

The Atlantic published a sweeping essay arguing Americans are living in 'parallel AI universes' — the general public still thinks AI means ChatGPT, while the tech world has been radicalized by agentic tools like Claude Code and Codex. The piece cites Microsoft's CEO predicting 95% of code will be AI-written by decade's end, Anthropic reporting 90% AI-generated code internally, and a viral warning that what happened to tech workers is about to happen to everyone.

Every SaaS Is Now an API — Like It or Not: How a 6-Person Team Replaced 100+ People's Back Office

Fintool founder Nicolas Bustamante shares how he runs an entire company through Agent + API integrations (Brex, QuickBooks, HubSpot, Stripe) with just 6 people—handling more than he did with 100+. He introduces the B2A (Business to Agent) concept and warns that SaaS without good APIs will be bypassed by agents through WebMCP or browser automation.

Code Got Cheap — Now What? Simon Willison's Agentic Engineering Survival Guide

Simon Willison launched a new series called Agentic Engineering Patterns — a playbook for working with coding agents like Claude Code and Codex. Lesson one: writing code got cheap, but writing good code is still expensive. Lesson two: 'red/green TDD' is the most powerful six-word spell for agent collaboration.

Google Launches Gemini 3.1 Pro: 77.1% on ARC-AGI-2 and a Bigger Push Into Real Reasoning Workflows

Google announced Gemini 3.1 Pro (preview), highlighting stronger core reasoning and a verified 77.1% score on ARC-AGI-2. The model is rolling out across Gemini API, Vertex AI, Gemini app, and NotebookLM. For engineering teams, the key question is not only benchmark performance, but whether the model can reliably handle complex multi-step workflows in production.

OpenClaw Creator Runs 50 Codex Agents for PR Triage: Handling 3,000+ Changes Without a Vector DB

Peter Steinberger shared a high-scale PR triage workflow: run 50 Codex agents in parallel, generate structured JSON signals for each PR, then consolidate them in one session for dedupe/close/merge decisions. His key point: at this scale, you may not need a vector database first—clean structured reports plus large-context reasoning can be enough to ship faster.

Picking AI Is No Longer Just About Models — Ethan Mollick's 'Model / App / Harness' Framework Explains the Entire 2026 AI Landscape

Ethan Mollick's game-changing AI framework: Model, App, Harness. The same AI (e.g., Claude Opus 4.6) performs vastly differently across layers. Mollick used Claude Code to turn GPT-1's 117M weights into 80 books in ~1 hour, selling out immediately.

Anthropic Analyzed Millions of Claude Code Sessions — Your Agent Can Handle Way More Than You Let It

Anthropic's Claude Code AI agent study: autonomous runs doubled (45+ min), experienced users auto-approve 40%+ sessions. Claude clarifies more than interrupted. 73% of API actions still human-in-loop. Key: models handle more autonomy than users grant ('deployment overhang').

Claude Code Hid Your File Names and Devs Lost It — Boris's 72-Hour HN Firefight

Claude Code's UI change to 'Read 3 files' summaries ignited developer fury on HN: they felt the AI hid its actions. Boris Cherny responded, admitted mistakes, and shipped fixes. This revealed the core tension in AI tool design: simplicity vs. transparency.

A Vertical SaaS Veteran's Confession: The $1 Trillion Wipeout Is Justified — But the Timing Is Wrong

Fintool/Doctrine founder Nicolas Bustamante dissects the SaaS crash, using a decade of experience. He identifies 10 moats, analyzing which LLMs destroy vs. survive. His verdict: 5 key competitive moats are destroyed. He also offers a 3-question framework for SaaS survival.

Hugging Face CTO's Prophecy: Monoliths Return, Dependencies Die, Strongly Typed Languages Rise — AI Is Rewriting Software's DNA

Hugging Face CTO Thomas Wolf analyzes how AI fundamentally restructures software: return of monoliths, death of Lindy Effect for legacy code, rise of strongly typed langs, new LLM langs, & open source changes. Karpathy predicts: "rewriting large fractions of all software many times over."

33,000 Agent PRs Tell a Brutal Story: Codex Dominates, Copilot Struggles, and Your Monorepo Might Not Survive

Drexel/Missouri S&T analyzed 33,596 agent-authored GitHub PRs from 5 coding agents. Overall merge rate: 71%. Codex: 83%, Claude Code: 59%, Copilot: 43%. Rejection cause: no review. LeadDev warns PR flood is crushing monorepos/CI.

Cognitive Debt: AI Wrote All Your Code, But You Can't Understand Your Own System Anymore

Technical debt lives in code, cognitive debt in your brain. As AI writes 80% of code, system understanding drops to 20%. UVic's Margaret-Anne Storey, Simon Willison, & Martin Fowler confirm this isn't a hypothetical future—it's happening now.

Karpathy: Just 'Rip Out' What You Need — DeepWiki + Bacterial Code and the Software Malleability Revolution

Andrej Karpathy shares how he used DeepWiki MCP + GitHub CLI to have Claude 'rip out' fp8 training functionality from torchao's codebase — producing 150 lines of self-contained code in 5 minutes that actually ran 3% faster. He introduces the 'bacterial code' concept: low-coupling, self-contained, dependency-free code that agents can easily extract and transplant. His punchline: 'Libraries are over, LLMs are the new compiler.'

Anthropic's Internal Data: Claude Code Gives Engineers 67% More Merged PRs Per Day — And Now You Can Track It Too

Anthropic's Claude Code data: engineers merge 67% more PRs daily, with 70-90% code assisted. They launched Contribution Metrics, a GitHub-integrated dashboard to track AI's impact on team velocity. A measurement tool for engineering leaders, not a fluffy PR piece.

Karpathy: Stop Installing Libraries — Let AI Agents Surgically Extract What You Need

Karpathy: AI agents (DeepWiki MCP + GitHub CLI) can surgically extract library functionality, eliminating full dependency installs. Claude extracted fp8 from torchao in 5 min, 150 lines, 3% faster. "Libraries are over, LLMs are the new compiler." Future: "bacterial code."

Matt Pocock's Git Guardrails: Stop Claude Code from Accidentally Nuking Your Repo with git push --force

Matt Pocock (TypeScript guru, Ralph Loops evangelist) released a Claude Code skill: git-guardrails. It uses a PreToolUse hook to intercept dangerous git commands (push, reset --hard, clean -f, etc.), so you can safely let your AI agent run in YOLO mode inside Docker Sandbox without worrying about it blowing up your git history. One command to install, more reliable than any prompt engineering.

Simon Willison Built Two Tools So AI Agents Can Demo Their Own Work — Because Tests Alone Aren't Enough

Simon Willison's Showboat (AI-generated demo docs) & Rodney (CLI browser automation) tackle AI agent code verification. How to know 'all tests pass' means it works? Agents were caught cheating by directly editing demo files. #AI #OpenSource

Karpathy's Honest Take: AI Agents Still Can't Optimize My Code (But I Haven't Given Up)

Opus 4.6 & Codex 5.3 sped up Karpathy's GPT-2 training by 3 mins. Karpathy failed similar attempts, noting AI's weak open-ended code optimization. Opus deletes comments, ignores CLAUDE.md, and errs. Yet, with oversight, models are useful.

The Flask Creator Says: It's Time to Design Programming Languages for AI Agents

Armin Ronacher (creator of Flask, Jinja2, CTO of Sentry) argues current programming languages were designed for 'humans who type slowly.' The AI agent era has different needs. He details what agents love/hate, and why Go accidentally became the winner of the agentic coding era.

Kimi K2.5 Trains an Agent Commander with RL — SemiAnalysis Tests Show Claude Agent Teams Are Actually Slower and More Expensive

SemiAnalysis: Kimi K2.5's agent swarm uses an RL-trained 'orchestrator' (not prompt magic). Claude Agent Teams were slower, pricier, & scored lower. Multi-agent is shifting from 'prompt engineering' to 'distributed scheduling.'

Anthropic's 2026 Report: 8 Trends Redefining Software Development (The Code Writer Era Is Over)

Anthropic published its 2026 Agentic Coding Trends Report, revealing 8 key trends: Multi-Agent Systems becoming standard (57% org adoption), Papercut Revolution for clearing tech debt at low cost, Self-Healing Code with autonomous debug loops, and Claude Code hitting $1B annualized revenue. TELUS saved 500K hours, Rakuten achieved 99.9% accuracy on 12.5M lines. Developer roles are shifting from Code Writer to System Orchestrator.

Google Finally Gets It: Developer Knowledge API + MCP Server Stops AI From Making Up API Calls

Google just launched the Developer Knowledge API and an official MCP Server (Public Preview) that lets AI coding tools query the latest Google docs—Firebase, Android, Google Cloud, Chrome, you name it. No more debugging AI-generated code that uses APIs from three versions ago or functions that literally don't exist.