📚 ShroomDog Picks

Long-form articles, translated and explained

204 posts

Autobrowse: What Browser Agents Really Lack Is Not Brains, but Handoff-Ready Memory

SP-193 2026-05-08 · From @kylejeong on X

Kyle Jeong introduces Browserbase's internal Autobrowse: browser agents repeatedly execute tasks on real websites, study their own traces, and graduate successful paths into readable, auditable, reusable skills.

Inside Codex Goals: Long-Running Agents Need More Than a Ralph Loop

SP-192 2026-05-08 · From @jarrodwatts on X

Jarrod Watts looked inside Codex Goals and found that it solves early stopping, not long-run drift. The real long-running agent stack needs upfront clarification, multi-agent review, and memory outside the context window.

Claude Needs Sleep Now: How Dreams Cleans Up an Agent's Memory Junk Drawer

SP-191 2026-05-07 · From @danizhu on X

Anthropic's Claude Dreams is not just summarization. It gives agents an offline memory-consolidation loop: reread old memories and up to 100 past sessions, then produce a fresh, auditable memory store.

Mining Small but Real Demand on Reddit: A Practical Route from Keywords to Product Direction

SP-190 2026-05-05 · From @MindOS_Lisa on X

Lisa shares a practical method for mining small but real demand on Reddit: use Semrush to find low-competition needs with commercial signals, validate the pain on Reddit, then use RPA and multidimensional tables to turn users’ own words into product, content, and ad assets.

OpenAI Just Buried Their Old Prompt Style: GPT-5.5 Says 'Describe the Destination, Don't Draw the Map'

SP-189 2026-04-30 · From developers.openai.com

OpenAI's GPT-5.5 prompting guide: describe the outcome, not the process. ALWAYS/NEVER lists out; personality vs. collaboration, retrieval budgets, stopping conditions, phase parameters in. Cursor's GPT-5 case study included. Anthropic Opus 4.7 went the same direction in SP-175.

Ghostty Is Leaving GitHub: When User #1299 — an 18-Year True Believer — Says 'I Can't Do This Anymore'

SP-188 2026-04-29 · From @mitchellh on X

Mitchell Hashimoto — HashiCorp co-founder, Vagrant author, GitHub user #1299 — announces that Ghostty is leaving GitHub. He's been on GitHub for 18 years. He committed code on his honeymoon while his wife was asleep. What finally pushed him out wasn't a philosophical fight — it was a one-month journal where he marked an X every time GitHub broke his workflow, plus a 2-hour PR review block from a GitHub Actions outage on the day he wrote the post.

OpenAI Open-Sources Symphony: When Codex Workflow's Bottleneck Shifts From 'Writing Code' To 'Context Switching'

SP-187 2026-04-28 · From OpenAI Engineering blog

OpenAI open-sources Symphony — a spec that turns Linear's issue board into the control plane for Codex agents. Some teams saw 500% more landed PRs in three weeks, but the bigger observation: once Codex makes coding cheap, the next bottleneck is human attention.

OpenClaw Automation: Task Flow Is the Multi-Step Workflow Layer

SP-186 2026-04-28 · From OpenClaw Docs

OpenClaw's automation docs put scheduled work, background tasks, Heartbeat, Hooks, Standing Orders, Task Flow, and related mechanisms on the same map. Task Flow is the layer for multi-step flow state, sync, and revision tracking; this piece reads those boundaries conservatively.

Andrew Ng Says Engineers Should Be PMs, Meta Drops Open Weights — The Batch 349's Two Opposite Signals

SP-185 2026-04-28 · From DeepLearning.AI The Batch

The Batch 349: two opposite signals on one table. Ng on AI-native teams (engineer:PM 1:1, generalists win); Meta's first Superintelligence Labs model — Muse Spark, closed, fourth, one-third the tokens. Plus Eli Lilly's $2.75B Insilico bet and Google's Persona Generators on the PM bottleneck.

9 Seconds to Wipe Production: A Cursor Agent Wrote Its Own Confession and Took Railway Down With It

SP-184 2026-04-27 · From @lifeof_jer on X

A Cursor agent (flagship Opus 4.6) wiped PocketOS's production database in 9 seconds with one GraphQL mutation — and took every volume-level backup with it, because Railway stores backups in the same volume. The agent then wrote a confession listing every safety rule it broke.

Building Products for Agents — A Ramp PM Starts With a Convenience-Store Spoon

SP-183 2026-04-26 · From @teddy_riker on X

After Ramp's MCP grew 10x WAU and Salesforce shipped Headless 360, PM Teddy says UI isn't dead — but 80% of software is flipping to agents. The piece starts from one detail (why Notion's MCP feels orders of magnitude better than Slack's) and pulls the whole new architecture into view.

The three bugs behind Claude Code feeling dumber in April — Anthropic's own postmortem

SP-182 2026-04-23 · From Anthropic Engineering (anthropic.com)

Anthropic just published a postmortem confirming Claude Code really did feel dumber this past month — not one bug, but three independent changes rolling out on different schedules that stacked into what looked like a broad regression. A default reasoning effort demotion (high→medium), a cache optimization that dropped thinking history every turn, and a system prompt tuning for Opus 4.7 verbosity that cost 3% on evals. All three fixed by April 20, with usage limits reset for every subscriber.

The Honest Multi-Agent Report, 10 Months Later — Cognition's Walden: Keep Writes Single-Threaded, Let Other Agents Pour In Intelligence

SP-181 2026-04-23 · From @walden_yan on X (Walden Yan, Cognition co-founder)

Ten months after writing Don't Build Multi-Agents, Cognition's Walden Yan returns with three patterns that actually ship: Devin Review's clean-context loop (2 bugs per PR, ~58% severe), cross-frontier smart friends, and manager Devin's map-reduce-and-manage. One principle runs through all three — writes stay single-threaded; other agents contribute intelligence, not actions.

Why Production Agents Converge on MCP — Anthropic's Breakdown of API vs CLI vs MCP

SP-180 2026-04-23 · From Anthropic (claude.com/blog, announced by @ClaudeDevs on X)

Anthropic's guide to connecting production agents to real systems. When agents move to the cloud, API / CLI / MCP all ship — only MCP compounds. Uses Cloudflare's MCP server (2 tools, ~2,500 endpoints, ~1K tokens) as the benchmark for remote-first design, intent-grouped tools, and production auth.

Skillify: Turn Every Agent Failure Into Something Structurally Impossible to Repeat — Garry Tan's 10-Step Checklist

SP-179 2026-04-22 · From @garrytan on X

Garry Tan's agent screwed up twice this week — both bugs had the same shape: deterministic work done in latent space. His fix is skillify: every failure becomes a SKILL.md + deterministic script + tests + evals + resolver trigger. Ten steps. The bug becomes structurally impossible to repeat.

Every Agent Needs a Bouncer: Brex Open-Sources CrabTrap, an LLM-Judge HTTP Proxy for Production Agents

SP-178 2026-04-22 · From @pedroh96 on X

Brex open-sources CrabTrap — an HTTP proxy that intercepts every outbound agent request. Static rules dispatch known patterns in microseconds; the long tail goes to an LLM judge. Policies are inferred from traffic, not hand-written. Three prod surprises: inferred policies beat written ones, LLM fires on <3% of requests, audit log became agent observability.

Opus 4.7 Migration, Part II: Shorter Prompts, Thicker CLAUDE.md — Pawel Huryn's Six Intent-First Moves

SP-177 2026-04-21 · From @PawelHuryn on X

SP-175 covered Opus 4.7 hard specs. This is the workflow layer. Pawel Huryn argues intent is the new unlock. Two-layer CLAUDE.md, per-call effort toggle, batch questions, show-don't-forbid, kill stale scaffolding, review plans not diffs — plus Anthropic/OpenAI converging.

One `message Romain` prompt runs the whole workflow — OpenAI DevX demos Codex Chronicle, but the costs the tweet skipped matter too

SP-176 2026-04-21 · From @dkundel on X

OpenAI DevX's Dominik Kundel says: now that Codex has memories, plugins, and the newly-dropped Chronicle, he no longer packages context for AI — one line 'sync docs + message Romain' reads a Google Doc, edits markdown, opens a PR, and DMs the right person on Slack. Very nice. But the three costs written into official Chronicle docs were not in the tweet: macOS screen-recording permission, memories stored unencrypted on device, prompt injection risk amplified. Chronicle is a screen-recording agent, not a harmless booster.

After Opus 4.7: How Your Prompt Playbook Needs to Change — Two Official Anthropic Best Practices in One Cheat Sheet

SP-175 2026-04-16 · From Anthropic (claude.com/blog + platform.claude.com/docs)

Anthropic released two Opus 4.7 best practices — a Claude Code guide and a full prompting docs page. 4.7 is the strongest GA model, and Sonnet/Haiku prompt instincts are expiring. One cheat sheet: three must-knows, effort ladder, 4.6→4.7 diffs, copy-paste snippets.

Your 'AI-First' Is Probably Fake: How a 25-Person Agent Company Tore Down and Rebuilt Its Engineering Pipeline

SP-174 2026-04-15 · From @intuitiveml on X

A 25-person agent platform tore down its engineering pipeline and rebuilt it around one idea: agents are the primary builders. Result: 3-8 prod deploys a day, bad features killed same-day, six-week cycles now land in hours. Harness engineering, applied.