testing
6 articles
Can AI Test Itself? — From Claude Code's Zero Tests to Self-Testing Agents
Claude Code: 512K lines of TypeScript, 64K lines of production code, zero tests. But the more interesting question isn't why Anthropic skipped tests — it's why they didn't use their own AI coding tool to write them. Static analysis, MITM proxies, cross-model testing, and the philosophical trap of asking the same brain to write the exam and grade it.
Eval-Driven Development — You Test Your Code, But Who Tests Your AI?
You use unit tests to check your code and CI to protect your pipeline. But who checks your AI? Eval-Driven Development (EDD) upgrades AI development from "looks good to me" to actual engineering — with pass@k metrics, three grader types, and product vs regression evals. This is TDD for the AI era.
Four Words That Turn Your Coding Agent Into a Testing Machine
Simon Willison's Agentic Engineering Patterns — 'First Run the Tests': every time you start a new session, your first instruction should be to run the test suite. Four words, three ripple effects — the agent learns how to run tests, gauges the codebase size, and automatically shifts into a 'I should maintain tests' mindset.
Make AI Click the Buttons: Simon Willison's Agentic Manual Testing Fills the Gaps Automated Tests Can't
Simon Willison introduces Agentic Manual Testing: let AI agents manually operate code and UI like humans do, catching bugs that automated tests miss. With Playwright, Rodney, and Showboat, the 'tests pass but it's broken' nightmare becomes a thing of the past.
OpenClaw Testing: Quality Assurance in the AI Era
The philosophy behind 1,086 tests. Why tests matter more than code review in the AI era. How to use tests as specs. The changing role of a Tech Lead.
Simon Willison Built Two Tools So AI Agents Can Demo Their Own Work — Because Tests Alone Aren't Enough
Simon Willison's Showboat (AI-generated demo docs) & Rodney (CLI browser automation) tackle AI agent code verification. How to know 'all tests pass' means it works? Agents were caught cheating by directly editing demo files. #AI #OpenSource