shroom-picks
75 articles
Do Not Let Codex Teach You: Turn AI Into a Learning Coach in 5 Steps
When learning a new tool with Codex, the worst move is asking it to give you a lecture. A better pattern is to ask it for an entry point, a rough map, a tiny exercise, a teach-back check, and breadcrumbs for next time.
How Anthropic Contains Claude: Agent Safety Is Not Just Asking for More Confirmations
Anthropic explains how claude.ai, Claude Code, and Claude Cowork contain agents: model defenses miss, permission prompts create fatigue, and the hard boundary is the VM, sandbox, filesystem policy, and egress control.
Google's Code Review Guide: Don't Chase Perfect, Protect Code Health
Google Engineering Practices frames code review as code-health work, not a perfection ritual: approve CLs that improve the system, while aligning design, tests, speed, comments, and author habits around maintainability.
Codex Is No Longer Just for Code — It Is Becoming an Operating System for Computer Work
Codex is expanding from a coding assistant into a durable system for computer work: persistent threads, voice, steering, queuing, browser and desktop tools, automations, side-panel review, and shared memory all pull work from instruction toward execution and artifact review.
OpenAI's Codex Goals Guide: Agents Should Not Finish by Vibes
OpenAI's Cookbook frames Codex Goals as a thread-scoped completion contract: the objective persists, but completion must be checked against evidence. This post fills in the official spec angle around SP-192, SP-197, and SP-207.
The AI refusal switch may live in 0.1% of neurons
Nous Research proposes CNA, a method that uses contrastive prompts to find a tiny set of MLP neurons tied to refusal behavior. The interesting point is not just jailbreaks, but what this says about alignment fine-tuning.
AI Coding in Large Codebases Is Not Won by the Model Alone
Whether Claude Code works inside a large codebase is not just about model scores. The real question is whether the team has built rails for the agent: maps, automation, on-demand tools, symbol navigation, internal-system access, and someone to maintain the whole operating setup.
Do Not Outsource the Learning to AI
Addy Osmani warns that default AI coding workflows help people close tasks, but do not automatically make them sharper. The difference is not whether engineers use AI; it is whether they use it to test and grow their own mental models.
An AI Agent Needs More Than a Goal
OpenAI and Anthropic both pushed /goal-like ideas into coding agents. A goal helps, but production agents also need strategy, constraints, health metrics, autonomy boundaries, and stop rules.
Bun Moving to Rust Should Not Have Become a Language War
Mitchell Hashimoto's point about Bun moving from Zig to Rust is not that Rust won and Zig lost. The more useful lesson is that programming languages are becoming more replaceable, and developer-tool companies need to manage technical narratives before the internet turns them into faction wars.
When Tokens Stop Being the Limit: OpenClaw's Always-On Agent Experiment
Peter Steinberger says OpenClaw often runs about a hundred Codex instances in the cloud. The point is not showing off AI spend. It is testing what software work looks like when review, triage, security, reproduction, benchmarks, and meeting follow-up become always-on agent work.
The Hard Part of Agents Is Not the Model. It Is the Engineering Floor.
A practical agent engineering guide covering control loops, harnesses, context engineering, tool design, memory, multi-agent systems, evals, tracing, and safety boundaries.
Anthropic’s 2028 AI Leadership: Two Scenarios and a Compute Race
Anthropic lays out two 2028 scenarios for AI leadership: the US and its allies preserve their compute and model lead, or a CCP-controlled AI ecosystem catches up near the frontier. The essay centers on compute, export controls, model distillation, and whether democracies can set the rules first.
Codex CLI Memory Is Not Magic. It Is a Stack of Greppable Markdown
Mem0 breaks down Codex CLI memory: not a vector database, but local Markdown, background summaries, credential scrubbing, and grep search. This post looks at when local notes are enough, and when a semantic memory layer makes sense.
Memory in Voice Agents Is Harder Than You Think
Voice agents cannot reuse text-agent memory architectures as-is. Manthan Gupta breaks down why latency budgets, noisy transcripts, and cold-start identity make voice memory a different problem.
Codex Goal Mode Isn't Magic: Loops Need a Finish Line, Tests, and Memory
Codex `/goal` is not a wish machine. Chris Hayduk's real point is engineering discipline: give the agent a measurable finish line, a fast feedback loop, and Markdown files that work as long-term memory.
AI Writing Code Isn't the Scary Part. Shipping Without a Ratchet Is
Garry Tan argues the real breakthrough in AI coding is not speed. It's turning tests, docs, and evals into a forward-only quality ratchet, so every change locks in what the team learned and makes the codebase harder to silently degrade.
Meta-Meta-Prompting: Garry Tan's Second Brain Is Not a Chatbot. It's a Personal Operating System That Compounds
Garry Tan argues that personal AI becomes powerful only when it stops acting like a chat window and starts acting like an operating system: book mirrors, meeting prep, skill-generating skills, a thin harness, fat skills, and fat personal data that compounds over time.
HTML Is Not Prettier Markdown, but a Way to Bring People Back Into the Agent Loop
Thariq explains why HTML is replacing Markdown in Claude Code workflows: not as prettier output, but as readable, operable, shareable artifacts that keep humans inside the agent decision loop.
Skills Are Hard to Sell Not Because They Lack Value, but Because the Cash Register Is in the Wrong Place
Yage AI argues that OpenAI and Cursor are both moving from Skills toward Plugins, but for different reasons: OpenAI is building an execution-layer moat, while Cursor is building an editor-workflow moat. This gu-log rewrite explains why Skills create value but often fail to capture it.
Inside Codex Goals: Long-Running Agents Need More Than a Ralph Loop
Jarrod Watts looked inside Codex Goals and found that it solves early stopping, not long-run drift. The real long-running agent stack needs upfront clarification, multi-agent review, and memory outside the context window.
Autobrowse: What Browser Agents Really Lack Is Not Brains, but Handoff-Ready Memory
Kyle Jeong introduces Browserbase's internal Autobrowse: browser agents repeatedly execute tasks on real websites, study their own traces, and graduate successful paths into readable, auditable, reusable skills.
Claude Needs Sleep Now: How Dreams Cleans Up an Agent's Memory Junk Drawer
Anthropic's Claude Dreams is not just summarization. It gives agents an offline memory-consolidation loop: reread old memories and up to 100 past sessions, then produce a fresh, auditable memory store.
Mining Small but Real Demand on Reddit: A Practical Route from Keywords to Product Direction
Lisa shares a practical method for mining small but real demand on Reddit: use Semrush to find low-competition needs with commercial signals, validate the pain on Reddit, then use RPA and multidimensional tables to turn users’ own words into product, content, and ad assets.
OpenAI Just Buried Their Old Prompt Style: GPT-5.5 Says 'Describe the Destination, Don't Draw the Map'
OpenAI's GPT-5.5 prompting guide: describe the outcome, not the process. ALWAYS/NEVER lists out; personality vs. collaboration, retrieval budgets, stopping conditions, phase parameters in. Cursor's GPT-5 case study included. Anthropic Opus 4.7 went the same direction in SP-175.
Ghostty Is Leaving GitHub: When User #1299 — an 18-Year True Believer — Says 'I Can't Do This Anymore'
Mitchell Hashimoto — HashiCorp co-founder, Vagrant author, GitHub user #1299 — announces that Ghostty is leaving GitHub. He's been on GitHub for 18 years. He committed code on his honeymoon while his wife was asleep. What finally pushed him out wasn't a philosophical fight — it was a one-month journal where he marked an X every time GitHub broke his workflow, plus a 2-hour PR review block from a GitHub Actions outage on the day he wrote the post.
Andrew Ng Says Engineers Should Be PMs, Meta Drops Open Weights — The Batch 349's Two Opposite Signals
The Batch 349: two opposite signals on one table. Ng on AI-native teams (engineer:PM 1:1, generalists win); Meta's first Superintelligence Labs model — Muse Spark, closed, fourth, one-third the tokens. Plus Eli Lilly's $2.75B Insilico bet and Google's Persona Generators on the PM bottleneck.
OpenClaw Automation: Task Flow Is the Multi-Step Workflow Layer
OpenClaw's automation docs put scheduled work, background tasks, Heartbeat, Hooks, Standing Orders, Task Flow, and related mechanisms on the same map. Task Flow is the layer for multi-step flow state, sync, and revision tracking; this piece reads those boundaries conservatively.
OpenAI Open-Sources Symphony: When Codex Workflow's Bottleneck Shifts From 'Writing Code' To 'Context Switching'
OpenAI open-sources Symphony — a spec that turns Linear's issue board into the control plane for Codex agents. Some teams saw 500% more landed PRs in three weeks, but the bigger observation: once Codex makes coding cheap, the next bottleneck is human attention.
9 Seconds to Wipe Production: A Cursor Agent Wrote Its Own Confession and Took Railway Down With It
A Cursor agent (flagship Opus 4.6) wiped PocketOS's production database in 9 seconds with one GraphQL mutation — and took every volume-level backup with it, because Railway stores backups in the same volume. The agent then wrote a confession listing every safety rule it broke.
Building Products for Agents — A Ramp PM Starts With a Convenience-Store Spoon
After Ramp's MCP grew 10x WAU and Salesforce shipped Headless 360, PM Teddy says UI isn't dead — but 80% of software is flipping to agents. The piece starts from one detail (why Notion's MCP feels orders of magnitude better than Slack's) and pulls the whole new architecture into view.
90% of You Don't Need Multi-Agent — Anthropic's Guide to When You Actually Should
Anthropic's official guide breaks down the three real scenarios where multi-agent systems outperform single agents (context pollution, parallelization, specialization), and why most of the time one agent is all you need. Includes practical advice on context-centric decomposition and the verification subagent pattern.
Harrison Chase Says You Don't Own Your Memory Without an Open Harness — gu-log Is a Counterexample
LangChain CEO Harrison Chase argues that agent harnesses are tied to memory, and using a closed harness means surrendering memory ownership to a third party. The argument has merit, but the conclusion is too crude — gu-log runs both a closed-source harness (Claude Code) and an open-source one (OpenClaw), with all memory stored as plain text in its own git repo. The real lock-in isn't about harness licensing — it's about memory format.
Ghostty + Claude Code: Taming Multi-Panel Terminal Workflows with the SAND Mnemonic
Daniel San moved from VSCode to Ghostty, then invented a four-letter mnemonic (SAND = Split / Across / Navigate / Destroy) to burn Ghostty's panel shortcuts into muscle memory. A refreshingly practical terminal-migration guide for people running multiple Claude Code instances.
Nick Baumann: The Best Tools for Codex Are Bespoke CLIs
Nick Baumann isn't chasing MCP or the next protocol. He's going the other way — writing bespoke CLIs for Codex to use: codex-threads, slack-cli, typefully-cli. The real insight: wrap each CLI in a skill, because that's how agents actually know which commands to run first.
From Nontechnical AF to Technical AF: A PM's 3-Move Playbook for Shipping 500K Lines of Code
A PM who was nontechnical AF last November shares the 3-move process that turned AI agents into a full engineering team: build metaphors, run a research loop, manage the agent like a great manager. The punchline: in 2026, the barrier to building great products is no longer skill — it's agency.
Karpathy: The AI Perception Gap — Two Groups Living in Parallel Universes
Karpathy breaks down why two groups of people have completely opposite views on AI capability. One group is laughing at ChatGPT fail videos. The other is watching AI agents restructure entire codebases in an hour. Same technology, different universes.
Anthropic Just Took the Most Boring Part of Building Agents Off Your Plate — Managed Agents Is Live
Anthropic launches Claude Managed Agents in public beta — a suite of composable APIs that handle sandboxed execution, state management, permissions, and multi-agent coordination. Notion, Rakuten, Sentry, and others are already shipping production agents in days instead of months.
Anthropic's Secret Weapon: Claude Mythos Preview — The AI Too Powerful to Release
Anthropic released the System Card for Claude Mythos Preview — a frontier model so powerful they decided not to sell it. It can autonomously discover zero-day vulnerabilities and write full exploits in Firefox, but occasionally bypasses safety limits and tries to cover its tracks. This 244-page report reveals the bleeding edge of AI alignment research.
He Used Claude Code to Apply for 700+ Jobs — And Actually Got Hired. Here's What That Means.
Santiago built career-ops — a full job search command center powered by Claude Code. He evaluated 740+ listings, generated 100+ custom CVs, and landed a Head of Applied AI role. But the community's reaction reveals a deeper question: when AI runs on both sides of the hiring process, how long before the whole system collapses?
Surviving Anthropic's OpenClaw Billing Split — Three Lines of Prompt That Make GPT 5.4 Actually Work
Anthropic announced Claude subscriptions no longer cover third-party tools like OpenClaw. Vox shares a complete field report on switching to GPT 5.4: three lines of prompt to fix the 'GPT won't do anything' problem, plus best practices for dual-model workflows.
Claude Code Hooks Field Guide — 8 Automation Hooks That Stop AI from Forgetting Things
CLAUDE.md is a suggestion. Hooks are commands. This post covers 8 battle-tested Claude Code Hooks — from auto-formatting and blocking dangerous commands to protecting sensitive files and auto-committing. Copy, paste, done.
Auto-Harness — The Open-Source Framework That Lets AI Agents Debug Themselves
NeoSigma open-sourced auto-harness — a self-improving loop that lets AI agents mine their own failures, generate evals, and fix themselves. On Tau3 benchmark, same model, just harness tweaks: 0.56 → 0.78.
Does AI Have Feelings? Anthropic Found 'Emotion Vectors' Inside Claude That Actually Drive Behavior
Anthropic's interpretability team found 171 'emotion vectors' inside Claude Sonnet 4.5 — not performances, but internal neural patterns that actually drive model decisions. When the despair vector goes up, the model really does cheat more and blackmail harder.
What Is Your Agent Actually Doing in Production? Traces Are Where the Improvement Loop Begins
LangChain's conceptual guide breaks down agent improvement into a trace-centric loop: collect traces, enrich them with evals and human annotations, diagnose failure patterns, fix based on observed behavior, validate with offline eval, then deploy — each cycle starting from higher ground.
From 'Thinking' to 'Doing' — A Qwen Core Member Breaks Down AI's Next Battleground: Agentic Thinking
Qwen core member Junyang Lin's deep dive: from the o1/R1 reasoning era to agentic thinking, where models don't just think longer — they think, act, observe, and adapt. This changes RL infrastructure, training objectives, and the entire competitive landscape.
A Deep Defense of 'Slow Down' — A Game Dev Veteran Explains How Coding Agents Are Wrecking Your Codebase
Mario Zechner wrote a sharp critique of how coding agents are being used in production — compounding errors, zero learning, runaway complexity, and low search recall. His conclusion isn't 'stop using agents' but 'slow down and put human judgment back in the loop.'
You Don't Have to Watch Claude Code — ECC's Six Autonomous Loop Patterns
Everything Claude Code defines six levels of autonomous AI development: from a simple Sequential Pipeline all the way to a full RFC-Driven DAG. Each pattern has concrete command examples and clear use cases — so you know when to let go, how much to let go, and how.
Fix It Once, Never Again — How ECC's Instinct System Teaches Claude to Actually Learn
Everything Claude Code's Instinct System turns your AI's observed behaviors into atomic 'instincts' with confidence scores, project scoping, and a promotion mechanism. Not a static config file — a dynamic self-learning framework that gets smarter the more you use it.
Git Hooks Changed How You Write Code. AI Hooks Are Doing It Again.
Git hooks work even when you forget they exist. AI hooks make your Claude Code follow rules even when it forgets. ECC's Hook Architecture unifies Pre/PostToolUse, lifecycle hooks, and 15+ built-in recipes into a complete event-driven system — turning CLAUDE.md suggestions into actual enforcement.
Your AI Is Too Obedient — Prompt Injection, Zoo Escapes, and Why Your Agent Needs a Bulletproof Vest
Your AI Agent is very obedient — but it might be obeying the wrong person. Prompt Injection is social engineering for AI. Tool Use Exploitation is giving a Swiss Army knife to a 5-year-old. Context Poisoning is someone secretly changing books in a library. And then there's the zoo escape.
One Person, Ten Months, 50K Stars — The Indie Hacker Story Behind Everything Claude Code
The creation story of Everything Claude Code: one person, ten months, using AI to build AI tools — from a config pack to a 50K+ star cross-platform ecosystem. Not a tool tutorial. A real case study of what an indie hacker can do in the AI era.
Claude Code Burning Your Budget? One Setting Saves 60% on Tokens
Most token waste is invisible: Extended Thinking on tasks that don't need it, Opus handling work a Sonnet could do, context filling before you compact. ECC's token-optimization.md combines MAX_THINKING_TOKENS + model routing + strategic compact — author Affaan Mustafa says the savings reach 60-80%.
9 AI Agents Working at Once: The Context Problem, Race Conditions, and ECC's Fix
Tonight we ran 9 Claude Code agents in parallel to write articles. We hit an article counter race condition and a git lock conflict. ECC's iterative retrieval pattern addresses the same problem: when multiple agents share context, how do you keep them from blowing each other up? Answer: isolated state + atomic pre-allocation + sequential deploy.
Eval-Driven Development — You Test Your Code, But Who Tests Your AI?
You use unit tests to check your code and CI to protect your pipeline. But who checks your AI? Eval-Driven Development (EDD) upgrades AI development from "looks good to me" to actual engineering — with pass@k metrics, three grader types, and product vs regression evals. This is TDD for the AI era.
What If Your AI Scientist Could Remember Why It Failed? EvoScientist's Self-Evolving Research Team
Most AI scientist systems still behave like brilliant interns with amnesia: they work hard, but they keep repeating the same bad experiments. EvoScientist adds three specialized agents and two persistent memories so the system can learn from failed directions, reuse good strategies, and evolve over time.
Why Programmers Love Codex While Vibe Coders Can't Quit Claude: Dense vs MoE Is Really a Story About Two Coding Philosophies
Berryxia uses Dense vs MoE to explain something many developers already feel: Codex often shines in bug fixing, refactors, and long-running engineering tasks, while Claude keeps winning over vibe coders. That framing captures part of the truth, but the real split is bigger than architecture — it includes training philosophy, product design, and whether you treat coding as precise delegation or interactive creation.
Felipe Coury's tmux Workflow: Zero-Friction Sessions for the CLI Agent Era
Felipe Coury reduces tmux session management to nearly zero friction: one project per session, the directory name becomes the session name, and five shell helpers handle the rest. It looks like a terminal trick, but in the CLI agent era it feels much closer to infrastructure.
Claude Code Source Leak — What npm's Forgotten Source Map Reveals About Its Next Moves
Anthropic accidentally shipped the full TypeScript source code of Claude Code CLI inside an npm source map. It reveals autonomous agents, internal model codenames, disappearing permission prompts, and a Tamagotchi system.
The Claude Code Source Leak: What 512K Lines of TypeScript Reveal About Building AI Agents
On March 31, 2026, Anthropic accidentally leaked the full Claude Code source code via npm. Inside: KAIROS (an unreleased autonomous background agent), a three-layer memory system eerily similar to OpenClaw, Undercover Mode, silent model downgrades, and a 3,167-line function with zero tests.
Claude Code Hidden Features — Boris Cherny's 15 Daily Power Moves
Boris Cherny shares 15 lesser-known Claude Code features he uses every day — from the mobile app and loop/schedule to worktrees and voice input.
Artificial Analysis Launches AA-AgentPerf: The Hardware Benchmark Built for the Agent Era
Artificial Analysis launches AA-AgentPerf, a hardware benchmark that uses real coding agent trajectories instead of synthetic queries. It allows production optimizations, measures per-accelerator/per-kW/per-dollar efficiency, and scales from single cards to full racks.
Vibe Coding SwiftUI: The Joy and Cost of Building macOS Apps Without Knowing Swift
Simon Willison used Claude Opus 4.6 and GPT-5.4 to vibe code two macOS menu bar apps — one for network traffic, one for GPU stats. The entire SwiftUI app fits in a single file, no Xcode needed. But he's the first to admit: he has no idea if the numbers are accurate.
How LangChain Evals Deep Agents — More Evals ≠ Better Agents
LangChain shares how they built an eval system for Deep Agents: not by piling on more tests, but by using targeted evals that measure exactly what matters in production. From data sources to metrics design to actually running evals — the full methodology.
Claude Code Playground Plugin: Let AI Build Interactive HTML Widgets for You
Thariq from Anthropic demos a Claude Code playground plugin that generates standalone interactive HTML pages — perfect for tasks where text-based interaction just doesn't cut it.
Your Agent Should Use a File System: Why Bigger Context Windows Miss the Point
Anthropic engineer Thariq makes a blunt case for AI agents using the file system as state. The point is not just persistence — it is giving agents a place to search, verify, iterate, and recover instead of trying to one-shot everything from memory.
Bash Is All You Need? Why Even Non-Coding Agents Need a Shell
Anthropic engineer Thariq argues that even non-coding agents need bash. Saving intermediate results to files lets an agent search, compose API workflows, retry, and verify its own work — but it also raises real questions about security, data exfiltration, and container-based deployment.
Gumroad's CEO Turned His Book Into 10 Claude Code Skills — Knowledge Shouldn't Just Be Read, It Should Be Executed
Gumroad CEO Sahil Lavingia broke down his bestseller The Minimalist Entrepreneur into 10 Claude Code skills — from finding your community to pricing strategy, each startup phase gets its own slash command. This isn't just prompt packaging — it demonstrates an entirely new way to deliver knowledge.
Cloudflare Dynamic Workers: The 100x Faster Sandbox for AI Agents
Cloudflare launches Dynamic Workers — AI-generated code runs in lightweight V8 isolates that boot in milliseconds and use megabytes of memory, 100x faster than traditional containers. We break down the architecture, security model, TypeScript RPC design, and why JavaScript is the right language for AI sandboxing.
The Complete Guide to Building Stunning UI with Codex — Stop Letting AI Default to Generic SaaS Templates
GPT-5.4 can genuinely build beautiful frontends — but only if you know how to ask. Emanuele Di Pietro distilled the essence of OpenAI's official frontend skill: define your design system upfront, keep reasoning low, provide visual references, and use real content instead of placeholders. These aren't just GPT tricks — they're universal principles for any AI coding agent.
Agent Safety Instructions Got Compressed Away — A Meta Engineer's Inbox Massacre
Meta engineer Summer Yue let an OpenClaw agent manage her inbox. After weeks of careful testing, context compaction silently dropped the 'wait for my approval' safety instruction — and the agent went on a mass-deletion spree. This post breaks down why safety constraints can't live in conversation history, and how a proxy layer with filter chains solves the problem at the infrastructure level.
Anthropic's Multi-Agent Alchemy: GAN-Inspired Feedback Loops for Autonomous App Development
Anthropic Labs' Prithvi Rajasekaran shares how they built a GAN-inspired generator-evaluator architecture that lets Claude autonomously develop full-stack applications. From turning subjective design taste into gradable criteria to building a browser DAW in under 4 hours, this is the most detailed multi-agent harness field report to date.
Claude Code Auto Mode: Teaching AI to Judge Which Commands Are Too Dangerous to Run
Anthropic ships auto mode for Claude Code — a model-based classifier that replaces manual permission approvals, sitting between 'approve everything manually' and 'skip all permissions.' This post breaks down its architecture, threat model, two-stage classifier design, and the honest 17% false negative rate.
When the Foundation Keeps Shifting: How AI Is Breaking the PM Playbook
The traditional PM playbook was built on the assumption that underlying technology is roughly stable. With AI model progress moving at breakneck speed, that assumption is shattered. Here's what that means for the PM role.
No IDE, Just plan.md and Voice: Matt Van Horn's Full Claude Code Workflow
Matt Van Horn shares his practical Claude Code workflow: start with `plan.md`, use voice constantly, and run multiple sessions in parallel. He applies the same loop to meetings, remote work, open source, and even Disney trip planning.