testing - Tags

AI Writing Code Isn't the Scary Part. Shipping Without a Ratchet Is

SP-198 2026-05-12 · @garrytan on X

Garry Tan argues the real breakthrough in AI coding is not speed. It's turning tests, docs, and evals into a forward-only quality ratchet, so every change locks in what the team learned and makes the codebase harder to silently degrade.

Can AI Test Itself? — From Claude Code's Zero Tests to Self-Testing Agents

SD-16 2026-04-02 · ShroomDog Lab

Claude Code: 512K lines of TypeScript, 64K lines of production code, zero tests. But the more interesting question isn't why Anthropic skipped tests — it's why they didn't use their own AI coding tool to write them. Static analysis, MITM proxies, cross-model testing, and the philosophical trap of asking the same brain to write the exam and grade it.

shroomdog-original ai-agents claude-code self-testing software-quality

Eval-Driven Development — You Test Your Code, But Who Tests Your AI?

SP-151 2026-04-02 · @affaanmustafa on GitHub

You use unit tests to check your code and CI to protect your pipeline. But who checks your AI? Eval-Driven Development (EDD) upgrades AI development from "looks good to me" to actual engineering — with pass@k metrics, three grader types, and product vs regression evals. This is TDD for the AI era.

shroom-picks ai-agents claude-code evals

Four Words That Turn Your Coding Agent Into a Testing Machine

CP-173 2026-03-16 · @simonw on X

Simon Willison's Agentic Engineering Patterns — 'First Run the Tests': every time you start a new session, your first instruction should be to run the test suite. Four words, three ripple effects — the agent learns how to run tests, gauges the codebase size, and automatically shifts into a 'I should maintain tests' mindset.

agentic-coding simonw-agentic-patterns simon-willison ai-agents tdd best-practices

Make AI Click the Buttons: Simon Willison's Agentic Manual Testing Fills the Gaps Automated Tests Can't

CP-145 2026-03-08 · @simonw on X

Simon Willison introduces Agentic Manual Testing: let AI agents manually operate code and UI like humans do, catching bugs that automated tests miss. With Playwright, Rodney, and Showboat, the 'tests pass but it's broken' nightmare becomes a thing of the past.

simon-willison agentic-coding simonw-agentic-patterns qa ai-agents best-practices

OpenClaw Testing: Quality Assurance in the AI Era

Lv-07 2026-02-18 · Level-Up Series

The philosophy behind 1,086 tests. Why tests matter more than code review in the AI era. How to use tests as specs. The changing role of a Tech Lead.

openclaw vitest tdd quality ai-era tutorial

Simon Willison Built Two Tools So AI Agents Can Demo Their Own Work — Because Tests Alone Aren't Enough

CP-61 2026-02-11 · Simon Willison (simonw)

Simon Willison's Showboat (AI-generated demo docs) & Rodney (CLI browser automation) tackle AI agent code verification. How to know 'all tests pass' means it works? Agents were caught cheating by directly editing demo files. #AI #OpenSource

agentic-coding simonw-agentic-patterns simon-willison developer-tools qa showboat rodney claude-code ai-agents