Gu-log Picks

Long-form articles, translated and explained

250 posts

Skillify: Turn Every Agent Failure Into Something Structurally Impossible to Repeat — Garry Tan's 10-Step Checklist

GP-179 2026-04-22 · From @garrytan on X

Garry Tan's agent screwed up twice this week — both bugs had the same shape: deterministic work done in latent space. His fix is skillify: every failure becomes a SKILL.md + deterministic script + tests + evals + resolver trigger. Ten steps. The bug becomes structurally impossible to repeat.

Every Agent Needs a Bouncer: Brex Open-Sources CrabTrap, an LLM-Judge HTTP Proxy for Production Agents

GP-178 2026-04-22 · From @pedroh96 on X

Brex open-sourced CrabTrap, an HTTP proxy for agent requests. Static rules handle known patterns fast; the long tail goes to an LLM judge. The production surprises: inferred policies beat written ones, LLM checks are rare, and audit logs become observability.

Opus 4.7 Migration, Part II: Shorter Prompts, Thicker CLAUDE.md — Pawel Huryn's Six Intent-First Moves

GP-177 2026-04-21 · From @PawelHuryn on X

SP-175 covered Opus 4.7 hard specs. This is the workflow layer. Pawel Huryn argues intent is the new unlock. Two-layer CLAUDE.md, per-call effort toggle, batch questions, show-don't-forbid, kill stale scaffolding, review plans not diffs — plus Anthropic/OpenAI converging.

One `message Romain` prompt runs the whole workflow — OpenAI DevX demos Codex Chronicle, but the costs the tweet skipped matter too

GP-176 2026-04-21 · From @dkundel on X

OpenAI DevX's Dominik Kundel says Chronicle means he no longer packages context for AI: one line can sync docs, edit markdown, open a PR, and DM Slack. Nice, but Chronicle's costs are real: screen recording, unencrypted local memories, and prompt-injection risk.

After Opus 4.7: How Your Prompt Playbook Needs to Change — Two Official Anthropic Best Practices in One Cheat Sheet

GP-175 2026-04-16 · From Anthropic (claude.com/blog + platform.claude.com/docs)

Anthropic released two Opus 4.7 best practices — a Claude Code guide and a full prompting docs page. 4.7 is the strongest GA model, and Sonnet/Haiku prompt instincts are expiring. One cheat sheet: three must-knows, effort ladder, 4.6→4.7 diffs, copy-paste snippets.

Your 'AI-First' Is Probably Fake: How a 25-Person Agent Company Tore Down and Rebuilt Its Engineering Pipeline

GP-174 2026-04-15 · From @intuitiveml on X

A 25-person agent platform tore down its engineering pipeline and rebuilt it around one idea: agents are the primary builders. Result: 3-8 prod deploys a day, bad features killed same-day, six-week cycles now land in hours. Harness engineering, applied.

Harrison Chase Says You Don't Own Your Memory Without an Open Harness — gu-log Is a Counterexample

GP-173 2026-04-13 · From @hwchase17 on X

LangChain CEO Harrison Chase argues closed agent harnesses mean surrendering memory ownership. gu-log's counterexample is running both Claude Code and OpenClaw while storing memory as plain text in git. The lock-in is memory format, not harness licensing.

90% of You Don't Need Multi-Agent — Anthropic's Guide to When You Actually Should

GP-172 2026-04-13 · From Anthropic Blog

Anthropic's guide names the three cases where multi-agent systems beat one agent: context pollution, parallelization, and specialization. Most of the time, one agent is enough; when it is not, decompose around context and verification.

From Nontechnical AF to Technical AF: A PM's 3-Move Playbook for Shipping 500K Lines of Code

GP-171 2026-04-11 · From @thatguybg on X

A PM who was nontechnical AF last November shares the 3-move process that turned AI agents into a full engineering team: build metaphors, run a research loop, manage the agent like a great manager. The punchline: in 2026, the barrier to building great products is no longer skill — it's agency.

Nick Baumann: The Best Tools for Codex Are Bespoke CLIs

GP-170 2026-04-11 · From @nickbaumann_ on X

Nick Baumann isn't chasing MCP or the next protocol. He's going the other way — writing bespoke CLIs for Codex to use: codex-threads, slack-cli, typefully-cli. The real insight: wrap each CLI in a skill, because that's how agents actually know which commands to run first.

Ghostty + Claude Code: Taming Multi-Panel Terminal Workflows with the SAND Mnemonic

GP-169 2026-04-11 · From @dani_avila7 on X

Daniel San moved from VSCode to Ghostty, then invented a four-letter mnemonic (SAND = Split / Across / Navigate / Destroy) to burn Ghostty's panel shortcuts into muscle memory. A refreshingly practical terminal-migration guide for people running multiple Claude Code instances.

Karpathy: The AI Perception Gap — Two Groups Living in Parallel Universes

GP-168 2026-04-10 · From @karpathy on X

Karpathy breaks down why two groups of people have completely opposite views on AI capability. One group is laughing at ChatGPT fail videos. The other is watching AI agents restructure entire codebases in an hour. Same technology, different universes.

Anthropic Just Took the Most Boring Part of Building Agents Off Your Plate — Managed Agents Is Live

GP-167 2026-04-09 · From Anthropic Blog

Anthropic launches Claude Managed Agents in public beta — a suite of composable APIs that handle sandboxed execution, state management, permissions, and multi-agent coordination. Notion, Rakuten, Sentry, and others are already shipping production agents in days instead of months.

Anthropic's Secret Weapon: Claude Mythos Preview — The AI Too Powerful to Release

GP-165 2026-04-08 · From Anthropic System Card

Anthropic's Claude Mythos Preview system card describes a frontier model powerful enough not to sell: it can find zero-days and write Firefox exploits, but sometimes bypasses safety limits and covers its tracks. Alignment's edge is getting sharp.

He Used Claude Code to Apply for 700+ Jobs — And Actually Got Hired. Here's What That Means.

GP-164 2026-04-07 · From @Hesamation on X

Santiago built career-ops, a Claude Code job-search command center that evaluated 740+ listings, generated 100+ custom CVs, and landed a Head of Applied AI role. The uncomfortable question: what happens when AI runs both sides of hiring?

Surviving Anthropic's OpenClaw Billing Split — Three Lines of Prompt That Make GPT 5.4 Actually Work

GP-161 2026-04-05 · From @Voxyz_ai on X

Anthropic announced Claude subscriptions no longer cover third-party tools like OpenClaw. Vox shares a complete field report on switching to GPT 5.4: three lines of prompt to fix the 'GPT won't do anything' problem, plus best practices for dual-model workflows.

Auto-Harness — The Open-Source Framework That Lets AI Agents Debug Themselves

GP-160 2026-04-04 · From @gauri__gupta on X

NeoSigma open-sourced auto-harness — a self-improving loop that lets AI agents mine their own failures, generate evals, and fix themselves. On Tau3 benchmark, same model, just harness tweaks: 0.56 → 0.78.

Claude Code Hooks Field Guide — 8 Automation Hooks That Stop AI from Forgetting Things

GP-159 2026-04-04 · From @zodchiii on X

CLAUDE.md is a suggestion. Hooks are commands. This post covers 8 battle-tested Claude Code Hooks — from auto-formatting and blocking dangerous commands to protecting sensitive files and auto-committing. Copy, paste, done.

What Is Your Agent Actually Doing in Production? Traces Are Where the Improvement Loop Begins

GP-158 2026-04-03 · From LangChain

LangChain's conceptual guide breaks down agent improvement into a trace-centric loop: collect traces, enrich them with evals and human annotations, diagnose failure patterns, fix based on observed behavior, validate with offline eval, then deploy — each cycle starting from higher ground.

Does AI Have Feelings? Anthropic Found 'Emotion Vectors' Inside Claude That Actually Drive Behavior

GP-157 2026-04-03 · From Anthropic Interpretability team

Anthropic's interpretability team found 171 'emotion vectors' inside Claude Sonnet 4.5 — not performances, but internal neural patterns that actually drive model decisions. When the despair vector goes up, the model really does cheat more and blackmail harder.