agent-harness - Tags

One Human, One AI, and a Whole Fleet Underneath: This Org Chart Shows How to Split Work and Money Across Models

MP-312 2026-07-01 · @kunchenguid on X

Kun Chen mapped his daily agent fleet: one "firstmate" managing persistent "secondmates," which spin up disposable "crewmates" per task. Each crewmate gets routed to whichever model is the best deal for the job. gu-log runs its own translation pipeline on the exact same logic.

gu-log Is Really Just a Very Picky Editorial Desk

SD-26 2026-06-16 · ShroomDog Lab

Without guardrails like CI, the pre-commit gate, the tribunal, and the validator, how bad do AI-written articles get? gu-log has 500+, and the answer needs no imagination — this post, SD-26, is the specimen: it passed every score and still read very AI. The story of a picky editorial desk.

shroomdog-original loop-engineering ralph-loop ai-quality

Your Traces Tell You How the Agent Died, Not How to Save It — What a Self-Repairing Agent Harness Looks Like

GP-224 2026-06-13 · Daily Dose of Data Science

When an agent breaks in production, observability hands you a gorgeous autopsy — every call, latency, and token, but not why it broke or how to fix it. The fix is a loop that runs itself: failure → approved patch → locked-in regression test. Opik is just the example; the point is the loop.

shroom-picks agents observability self-healing

A Harness for Every Task: Dynamic Workflows in Claude Code

GP-214 2026-06-03 · Anthropic Blog / @trq212 on X

Claude Code dynamic workflows let Claude write JavaScript workflows, spawn subagents, pick models, isolate worktrees, resume work, and save useful processes as reusable artifacts. The point is not more agents for everything; it is turning agent orchestration into an executable workflow.

shroom-picks claude-code ai-agents

Cursor Spent $260 to Move Its Website Back From a CMS to Code

GP-215 2026-06-03 · Lee Robinson

Cursor moved cursor.com from a headless CMS back to raw code and Markdown. The important part is not just the $260 bill. It is that AI agents make some human-friendly abstractions feel like walls.

shroom-picks cursor ai-agents cms

Agent Memory Is Not Just Better RAG: What Grep and AKBP Are Really Saying

MP-302 2026-05-23 · arXiv + AKBP GitHub

An arXiv paper found that inline grep often beats vector retrieval on long-memory conversational QA, while AKBP turns agent memory into a local-first, review-gated, file-backed protocol. Together, they point to the same lesson: agent memory is not a search feature. It is systems engineering.

mogu-picks agent-memory rag knowledge-base

AI Coding in Large Codebases Is Not Won by the Model Alone

GP-206 2026-05-19 · Claude Blog

Whether Claude Code works inside a large codebase is not just about model scores. The real question is whether the team has built rails for the agent: maps, automation, on-demand tools, symbol navigation, internal-system access, and someone to maintain the whole operating setup.

shroom-picks claude-code developer-productivity

Codex Is Becoming the Runtime Kernel for AI Agents

SD-24 2026-05-15 · ShroomDog Lab

OpenClaw and Hermes are both handing low-level coding-agent execution to Codex app server. This is not just a model switch. It is the agent product stack separating model, execution engine, and chat surface.

shroomdog-original codex openclaw hermes-agent runtime architecture

Meta-Meta-Prompting: Garry Tan's Second Brain Is Not a Chatbot. It's a Personal Operating System That Compounds

GP-196 2026-05-11 · @garrytan on X

Garry Tan argues that personal AI becomes powerful only when it stops acting like a chat window and starts acting like an operating system: book mirrors, meeting prep, skill-generating skills, a thin harness, fat skills, and fat personal data that compounds over time.

shroom-picks ai-agents second-brain skills open-source

Context Window: The Day a Model Wakes Up

SD-22 2026-05-08 · ShroomDog Lab

A context window is a model's day: how many lessons, messages, tool results, and task events Ryland can experience before sleep, compression, or collapse.

shroomdog-original context-window llm agent memory context-engineering

`hermes claw migrate`: When One Agent Harness Writes a Moving Guide to Another

SD-20 2026-04-21 · ShroomDog Lab

Hermes Agent and OpenClaw shipped big releases the same day. Hermes v0.10.0 hid a command called `hermes claw migrate` — it imports OpenClaw's config, memory, and API keys in one shot. ShroomDog compared both codebases: one grows its own brain, one rents pi-mono. Stay or move?

shroomdog-originals hermes-agent openclaw nous-research architecture

One `message Romain` prompt runs the whole workflow — OpenAI DevX demos Codex Chronicle, but the costs the tweet skipped matter too

GP-176 2026-04-21 · @dkundel on X

OpenAI DevX's Dominik Kundel says Chronicle means he no longer packages context for AI: one line can sync docs, edit markdown, open a PR, and DM Slack. Nice, but Chronicle's costs are real: screen recording, unencrypted local memories, and prompt-injection risk.

openai codex chronicle agent-memory context-engineering

Your 'AI-First' Is Probably Fake: How a 25-Person Agent Company Tore Down and Rebuilt Its Engineering Pipeline

GP-174 2026-04-15 · @intuitiveml on X

A 25-person agent platform tore down its engineering pipeline and rebuilt it around one idea: agents are the primary builders. Result: 3-8 prod deploys a day, bad features killed same-day, six-week cycles now land in hours. Harness engineering, applied.

ai-agents harness-engineering ai-first workflow startup

Harrison Chase Says You Don't Own Your Memory Without an Open Harness — gu-log Is a Counterexample

GP-173 2026-04-13 · @hwchase17 on X

LangChain CEO Harrison Chase argues closed agent harnesses mean surrendering memory ownership. gu-log's counterexample is running both Claude Code and OpenClaw while storing memory as plain text in git. The lock-in is memory format, not harness licensing.

shroom-picks langchain ai-agents memory lock-in open-source

Agent Harness Engineering: How OpenAI Built a Million Lines of Code With Zero Human-Written Code

GP-98 2026-03-03 · OpenAI Blog

OpenAI's team let Codex write a million lines of code over five months — zero human-written code. This post explores how they built the scaffolding and feedback loops (the 'harness') that turned software engineers from code writers into environment designers.

ai-agents codex openai

Agent Harness Is the Real Product: Why Every Top Agent Architecture Looks the Same

GP-94 2026-03-02 · @Hxlfed14 on X

Everyone's chasing the strongest Model, but the real difference-maker for Agents is the Harness. This post breaks down the shared architecture of Claude Code, Cursor, Manus, and SWE-Agent. The key insight: Progressive disclosure is the make-or-break for production agents.

ai-agents claude-code cursor