shroom-picks
34 articles
Claude Code Hooks Field Guide — 8 Automation Hooks That Stop AI from Forgetting Things
CLAUDE.md is a suggestion. Hooks are commands. This post covers 8 battle-tested Claude Code Hooks — from auto-formatting and blocking dangerous commands to protecting sensitive files and auto-committing. Copy, paste, done.
Auto-Harness — The Open-Source Framework That Lets AI Agents Debug Themselves
NeoSigma open-sourced auto-harness — a self-improving loop that lets AI agents mine their own failures, generate evals, and fix themselves. On Tau3 benchmark, same model, just harness tweaks: 0.56 → 0.78.
Does AI Have Feelings? Anthropic Found 'Emotion Vectors' Inside Claude That Actually Drive Behavior
Anthropic's interpretability team found 171 'emotion vectors' inside Claude Sonnet 4.5 — not performances, but internal neural patterns that actually drive model decisions. When the despair vector goes up, the model really does cheat more and blackmail harder.
What Is Your Agent Actually Doing in Production? Traces Are Where the Improvement Loop Begins
LangChain's conceptual guide breaks down agent improvement into a trace-centric loop: collect traces, enrich them with evals and human annotations, diagnose failure patterns, fix based on observed behavior, validate with offline eval, then deploy — each cycle starting from higher ground.
From 'Thinking' to 'Doing' — A Qwen Core Member Breaks Down AI's Next Battleground: Agentic Thinking
Qwen core member Junyang Lin's deep dive: from the o1/R1 reasoning era to agentic thinking, where models don't just think longer — they think, act, observe, and adapt. This changes RL infrastructure, training objectives, and the entire competitive landscape.
A Deep Defense of 'Slow Down' — A Game Dev Veteran Explains How Coding Agents Are Wrecking Your Codebase
Mario Zechner wrote a sharp critique of how coding agents are being used in production — compounding errors, zero learning, runaway complexity, and low search recall. His conclusion isn't 'stop using agents' but 'slow down and put human judgment back in the loop.'
You Don't Have to Watch Claude Code — ECC's Six Autonomous Loop Patterns
Everything Claude Code defines six levels of autonomous AI development: from a simple Sequential Pipeline all the way to a full RFC-Driven DAG. Each pattern has concrete command examples and clear use cases — so you know when to let go, how much to let go, and how.
Fix It Once, Never Again — How ECC's Instinct System Teaches Claude to Actually Learn
Everything Claude Code's Instinct System turns your AI's observed behaviors into atomic 'instincts' with confidence scores, project scoping, and a promotion mechanism. Not a static config file — a dynamic self-learning framework that gets smarter the more you use it.
Git Hooks Changed How You Write Code. AI Hooks Are Doing It Again.
Git hooks work even when you forget they exist. AI hooks make your Claude Code follow rules even when it forgets. ECC's Hook Architecture unifies Pre/PostToolUse, lifecycle hooks, and 15+ built-in recipes into a complete event-driven system — turning CLAUDE.md suggestions into actual enforcement.
Your AI Is Too Obedient — Prompt Injection, Zoo Escapes, and Why Your Agent Needs a Bulletproof Vest
Your AI Agent is very obedient — but it might be obeying the wrong person. Prompt Injection is social engineering for AI. Tool Use Exploitation is giving a Swiss Army knife to a 5-year-old. Context Poisoning is someone secretly changing books in a library. And then there's the zoo escape.
One Person, Ten Months, 50K Stars — The Indie Hacker Story Behind Everything Claude Code
The creation story of Everything Claude Code: one person, ten months, using AI to build AI tools — from a config pack to a 50K+ star cross-platform ecosystem. Not a tool tutorial. A real case study of what an indie hacker can do in the AI era.
Eval-Driven Development — You Test Your Code, But Who Tests Your AI?
You use unit tests to check your code and CI to protect your pipeline. But who checks your AI? Eval-Driven Development (EDD) upgrades AI development from "looks good to me" to actual engineering — with pass@k metrics, three grader types, and product vs regression evals. This is TDD for the AI era.
Claude Code Burning Your Budget? One Setting Saves 60% on Tokens
Most token waste is invisible: Extended Thinking on tasks that don't need it, Opus handling work a Haiku could do, context filling before you compact. ECC's token-optimization.md combines MAX_THINKING_TOKENS + model routing + strategic compact — author Affaan Mustafa says the savings reach 60-80%.
9 AI Agents Working at Once: The Context Problem, Race Conditions, and ECC's Fix
Tonight we ran 9 Claude Code agents in parallel to write articles. We hit an article counter race condition and a git lock conflict. ECC's iterative retrieval pattern addresses the same problem: when multiple agents share context, how do you keep them from blowing each other up? Answer: isolated state + atomic pre-allocation + sequential deploy.
What If Your AI Scientist Could Remember Why It Failed? EvoScientist's Self-Evolving Research Team
Most AI scientist systems still behave like brilliant interns with amnesia: they work hard, but they keep repeating the same bad experiments. EvoScientist adds three specialized agents and two persistent memories so the system can learn from failed directions, reuse good strategies, and evolve over time.
Why Programmers Love Codex While Vibe Coders Can't Quit Claude: Dense vs MoE Is Really a Story About Two Coding Philosophies
Berryxia uses Dense vs MoE to explain something many developers already feel: Codex often shines in bug fixing, refactors, and long-running engineering tasks, while Claude keeps winning over vibe coders. That framing captures part of the truth, but the real split is bigger than architecture — it includes training philosophy, product design, and whether you treat coding as precise delegation or interactive creation.
Felipe Coury's tmux Workflow: Zero-Friction Sessions for the CLI Agent Era
Felipe Coury reduces tmux session management to nearly zero friction: one project per session, the directory name becomes the session name, and five shell helpers handle the rest. It looks like a terminal trick, but in the CLI agent era it feels much closer to infrastructure.
Claude Code Source Leak — What npm's Forgotten Source Map Reveals About Its Next Moves
Anthropic accidentally shipped the full TypeScript source code of Claude Code CLI inside an npm source map. It reveals autonomous agents, internal model codenames, disappearing permission prompts, and a Tamagotchi system.
The Claude Code Source Leak: What 512K Lines of TypeScript Reveal About Building AI Agents
On March 31, 2026, Anthropic accidentally leaked the full Claude Code source code via npm. Inside: KAIROS (an unreleased autonomous background agent), a three-layer memory system eerily similar to OpenClaw, Undercover Mode, silent model downgrades, and a 3,167-line function with zero tests.
Claude Code Hidden Features — Boris Cherny's 15 Daily Power Moves
Boris Cherny shares 15 lesser-known Claude Code features he uses every day — from the mobile app and loop/schedule to worktrees and voice input.
Artificial Analysis Launches AA-AgentPerf: The Hardware Benchmark Built for the Agent Era
Artificial Analysis launches AA-AgentPerf, a hardware benchmark that uses real coding agent trajectories instead of synthetic queries. It allows production optimizations, measures per-accelerator/per-kW/per-dollar efficiency, and scales from single cards to full racks.
Vibe Coding SwiftUI: The Joy and Cost of Building macOS Apps Without Knowing Swift
Simon Willison used Claude Opus 4.6 and GPT-5.4 to vibe code two macOS menu bar apps — one for network traffic, one for GPU stats. The entire SwiftUI app fits in a single file, no Xcode needed. But he's the first to admit: he has no idea if the numbers are accurate.
How LangChain Evals Deep Agents — More Evals ≠ Better Agents
LangChain shares how they built an eval system for Deep Agents: not by piling on more tests, but by using targeted evals that measure exactly what matters in production. From data sources to metrics design to actually running evals — the full methodology.
Claude Code Playground Plugin: Let AI Build Interactive HTML Widgets for You
Thariq from Anthropic demos a Claude Code playground plugin that generates standalone interactive HTML pages — perfect for tasks where text-based interaction just doesn't cut it.
Your Agent Should Use a File System: Why Bigger Context Windows Miss the Point
Anthropic engineer Thariq makes a blunt case for AI agents using the file system as state. The point is not just persistence — it is giving agents a place to search, verify, iterate, and recover instead of trying to one-shot everything from memory.
Bash Is All You Need? Why Even Non-Coding Agents Need a Shell
Anthropic engineer Thariq argues that even non-coding agents need bash. Saving intermediate results to files lets an agent search, compose API workflows, retry, and verify its own work — but it also raises real questions about security, data exfiltration, and container-based deployment.
Gumroad's CEO Turned His Book Into 10 Claude Code Skills — Knowledge Shouldn't Just Be Read, It Should Be Executed
Gumroad CEO Sahil Lavingia broke down his bestseller The Minimalist Entrepreneur into 10 Claude Code skills — from finding your community to pricing strategy, each startup phase gets its own slash command. This isn't just prompt packaging — it demonstrates an entirely new way to deliver knowledge.
Cloudflare Dynamic Workers: The 100x Faster Sandbox for AI Agents
Cloudflare launches Dynamic Workers — AI-generated code runs in lightweight V8 isolates that boot in milliseconds and use megabytes of memory, 100x faster than traditional containers. We break down the architecture, security model, TypeScript RPC design, and why JavaScript is the right language for AI sandboxing.
The Complete Guide to Building Stunning UI with Codex — Stop Letting AI Default to Generic SaaS Templates
GPT-5.4 can genuinely build beautiful frontends — but only if you know how to ask. Emanuele Di Pietro distilled the essence of OpenAI's official frontend skill: define your design system upfront, keep reasoning low, provide visual references, and use real content instead of placeholders. These aren't just GPT tricks — they're universal principles for any AI coding agent.
Agent Safety Instructions Got Compressed Away — A Meta Engineer's Inbox Massacre
Meta engineer Summer Yue let an OpenClaw agent manage her inbox. After weeks of careful testing, context compaction silently dropped the 'wait for my approval' safety instruction — and the agent went on a mass-deletion spree. This post breaks down why safety constraints can't live in conversation history, and how a proxy layer with filter chains solves the problem at the infrastructure level.
Anthropic's Multi-Agent Alchemy: GAN-Inspired Feedback Loops for Autonomous App Development
Anthropic Labs' Prithvi Rajasekaran shares how they built a GAN-inspired generator-evaluator architecture that lets Claude autonomously develop full-stack applications. From turning subjective design taste into gradable criteria to building a browser DAW in under 4 hours, this is the most detailed multi-agent harness field report to date.
Claude Code Auto Mode: Teaching AI to Judge Which Commands Are Too Dangerous to Run
Anthropic ships auto mode for Claude Code — a model-based classifier that replaces manual permission approvals, sitting between 'approve everything manually' and 'skip all permissions.' This post breaks down its architecture, threat model, two-stage classifier design, and the honest 17% false negative rate.
When the Foundation Keeps Shifting: How AI Is Breaking the PM Playbook
The traditional PM playbook was built on the assumption that underlying technology is roughly stable. With AI model progress moving at breakneck speed, that assumption is shattered. Here's what that means for the PM role.
No IDE, Just plan.md and Voice: Matt Van Horn's Full Claude Code Workflow
Matt Van Horn shares his practical Claude Code workflow: start with `plan.md`, use voice constantly, and run multiple sessions in parallel. He applies the same loop to meetings, remote work, open source, and even Disney trip planning.