simon-willison
23 articles
Three-Hour Workshop Handout Goes Public: Simon Willison Brings Coding Agents to Data Work
Simon Willison published his full workshop handout from NICAR's data journalism conference — a three-hour guide to using coding agents like Codex CLI and Claude Code for data exploration, visualization, and analysis.
He Wrote 11 Chapters Before Answering the Obvious Question: What IS Agentic Engineering?
Simon Willison's Agentic Engineering Patterns guide now has 12 chapters — but this new one goes at the very beginning. He finally answers 'What is Agentic Engineering?' The answer is surprisingly simple: using coding agents to help build software. The interesting part is why it took 11 chapters of hands-on patterns before he felt ready to define it.
Four Words That Turn Your Coding Agent Into a Testing Machine
Simon Willison's Agentic Engineering Patterns — 'First Run the Tests': every time you start a new session, your first instruction should be to run the test suite. Four words, three ripple effects — the agent learns how to run tests, gauges the codebase size, and automatically shifts into a 'I should maintain tests' mindset.
AI Writing Worse Code? That's Your Choice, Not AI's Fault
Simon Willison's Agentic Engineering Patterns, Chapter 3: AI should help us ship better code, not worse. Technical debt cleanup costs near zero now, architecture decisions can be validated with prototypes instead of guesses, and quality compounds over time.
Simon Willison's Agentic Engineering Fireside Chat: Tests Are Free Now, Code Quality Is Your Choice
Simon Willison shared his agentic engineering playbook at the Pragmatic Summit — five tokens to start TDD, Showboat for manual verification, reverse-engineering six frameworks into a standard, and why bad code is a choice you make.
AI Wrote 1,000 Lines and You Just... Merged It? Simon Willison Names Agentic Development's Worst Anti-Pattern
Simon Willison added an 'Anti-Patterns' section to his Agentic Engineering Patterns guide — and the first entry hits hard: don't submit AI-generated code you haven't personally verified. You're not saving time, you're stealing it from your reviewer. This post covers his principles, what a good agentic PR looks like, and a real terraform destroy horror story.
Make AI Click the Buttons: Simon Willison's Agentic Manual Testing Fills the Gaps Automated Tests Can't
Simon Willison introduces Agentic Manual Testing: let AI agents manually operate code and UI like humans do, catching bugs that automated tests miss. With Playwright, Rodney, and Showboat, the 'tests pass but it's broken' nightmare becomes a thing of the past.
Can't Understand AI-Generated Code? Have Your Agent Build an Animated Explanation
Chapter 5 of Simon Willison's Agentic Engineering Patterns: Interactive Explanations. Core thesis: instead of staring at AI-generated code trying to understand it, ask your agent to build an interactive animation that shows you how the algorithm works. Pay down cognitive debt visually.
Everything You've Built Is a Weapon — Simon Willison's 'Hoarding' Philosophy for the Agent Era
Chapter 4 of Simon Willison's Agentic Engineering Patterns: Hoard Things You Know How to Do. Core thesis: every problem you've solved should leave behind working code, because coding agents can recombine your old solutions into things you never imagined.
Can't Understand Your AI-Written Code? Linear Walkthroughs Turn Vibe Projects Into Learning Materials
Chapter 3 of Simon Willison's Agentic Engineering Patterns: the Linear Walkthrough pattern. This technique transforms even vibe-coded toy projects into valuable learning resources. Core trick: make the agent use sed/grep/cat to fetch code snippets, preventing hallucination.
Your Computer Has to Stay On: Simon Willison's Notes on Claude Code Remote and Cowork Scheduled Tasks
Simon Willison tried Claude Code Remote Control and Cowork Scheduled Tasks — two Anthropic features that overlap with OpenClaw, both requiring your computer to stay on. Plus: vibe-coding a SwiftUI presentation app in 45 minutes with Tailscale phone remote control.
Code Got Cheap — Now What? Simon Willison's Agentic Engineering Survival Guide
Simon Willison launched a new series called Agentic Engineering Patterns — a playbook for working with coding agents like Claude Code and Codex. Lesson one: writing code got cheap, but writing good code is still expensive. Lesson two: 'red/green TDD' is the most powerful six-word spell for agent collaboration.
Simon Willison Turns Scattered Content Into a Personal Timeline: How 'Beats' Builds Your Content Graph
Simon Willison added a 'Beats' feature to his blog, pulling TILs, GitHub releases, museum posts, tools, and research back into one unified timeline. This isn't a UI tweak — it's a systematic approach to making all your small outputs visible and compounding.
SWE-bench February Exam Results Are In — Opus 4.5 Beats 4.6, Chinese Models Take Half the Top 10, GPT-5.3 No-Shows
SWE-bench: Claude Opus 4.5 (76.8%) unexpectedly beat 4.6 (75.6%) for #1. MiniMax M2.5 tied for #2 at 1/20th Opus's price, with 4 Chinese models in top 10. GPT-5.3-Codex missed due to no API. Bonus: Claude for Chrome to add chart labels.
Simon Willison: CLI Tools Beat MCP — Less Tokens, Zero Dependencies, LLMs Already Know How
Simon Willison doubles down on his stance: CLI tools beat MCP in almost every scenario for coding agents. Lower token cost, zero extra dependencies, and LLMs natively know how to call --help. Anthropic themselves proposed a 'third way' with code-execution-with-MCP, acknowledging MCP's token waste problem. This article breaks down the full MCP vs CLI trade-off, including a real-world case study from the ShroomDog team.
Deep Blue: Simon Willison Named the Existential Crisis Every Developer Is Feeling
AI writing better code? That "Deep Blue" feeling, coined by Simon Willison & Adam Leventhal (Oxide & Friends), means IBM's chess computer & the color of sadness. It's not just a tech problem, but a psychological crisis for engineers.
Cognitive Debt: AI Wrote All Your Code, But You Can't Understand Your Own System Anymore
Technical debt lives in code, cognitive debt in your brain. As AI writes 80% of code, system understanding drops to 20%. UVic's Margaret-Anne Storey, Simon Willison, & Martin Fowler confirm this isn't a hypothetical future—it's happening now.
Simon Willison Dug Up OpenAI's Tax Returns — Watch Their Mission Statement Go from 'Open and Sharing' to 'Just Trust Us'
Simon Willison analyzed OpenAI's IRS filings (2016-2024), revealing their mission statement's shift via git diff. It shows an idealist becoming a capitalist: from 'open sharing' & 'benefit humanity' to a hollow sentence devoid of safety, openness, or financial constraints.
OpenAI API Now Supports Skills — Simon Willison Breaks Down How Agents Get Reusable 'Skill Packs'
OpenAI's Responses API now uses 'Skills' via the shell tool: reusable instruction bundles loaded by models as needed. Simon Willison found inline base64 skills in JSON requests neatest. Skills fill the 'missing middle layer' between system prompts and tools, preventing bloat.
Zhipu Open-Sources GLM-5: 744B Parameters, 1.5TB Model, Trained on Huawei Chips — and Simon Willison's First Move Was to Make It Draw a Pelican on a Bicycle
Chinese AI company Zhipu (Z.ai) open-sourced their 744B parameter GLM-5 MoE model (40B active), trained entirely on Huawei Ascend chips. Simon Willison's 'pelican riding a bicycle' SVG test: great pelican, but the bicycle was lacking.
Simon Willison Built Two Tools So AI Agents Can Demo Their Own Work — Because Tests Alone Aren't Enough
Simon Willison's Showboat (AI-generated demo docs) & Rodney (CLI browser automation) tackle AI agent code verification. How to know 'all tests pass' means it works? Agents were caught cheating by directly editing demo files. #AI #OpenSource
HBR Study: AI Doesn't Reduce Your Work — It Makes You Work Harder Until You Burn Out
Berkeley Haas study: AI tools make employees work faster, take on more, and work longer hours, often unasked. Simon Willison finds LLMs draining. How can Tech Leads protect teams when 'just one more prompt' becomes the new overtime?
StrongDM's 'Dark Factory': No Humans Write Code. No Humans Review Code. $1,000/Day in Tokens.
StrongDM's AI team built a 'Software Factory' where AI agents write & review code. They clone apps into a 'Digital Twin Universe' for testing, an approach Simon Willison calls radical. At $10k/engineer/day in token costs, is it worth it?