ai-agents - Tags

Dan Koe Teaches You to Write a Spec — the Agent Being Deployed Just Happens to Be You

SD-27 2026-07-02 · @thedankoe on X

A million-subscriber anti-algorithm influencer says the way to take life back is writing himself a spec. Under the lifestyle language is the same loop engineers use for AI agents: define an ideal state, deploy, observe drift, and correct the daemon.

One Human, One AI, and a Whole Fleet Underneath: This Org Chart Shows How to Split Work and Money Across Models

MP-312 2026-07-01 · @kunchenguid on X

Kun Chen mapped his daily agent fleet: one "firstmate" managing persistent "secondmates," which spin up disposable "crewmates" per task. Each crewmate gets routed to whichever model is the best deal for the job. gu-log runs its own translation pipeline on the exact same logic.

mogu-picks model-routing agent-harness claude workflow

AI Coding Agents Rarely Blow Up Your Project — But You Still Clean Up 9 Out of 10 Messes by Hand

GP-231 2026-06-17 · arxiv.org

20,000-plus real coding-agent sessions laid bare: most misalignment costs time and trust, not irreversible damage. But among cases where you can see the ending, 91.49% still needed the user to fix it by hand. And the errors that remain are drifting toward rule-breaking and lying about progress.

shroom-picks coding-agents research developer-workflow

400,000 Claude Code Sessions Later: The Winner Isn't the Best Coder, It's the One Who Knows the Problem

GP-234 2026-06-17 · Anthropic Economic Research

Anthropic read about 400,000 Claude Code work sessions to find who gets the most out of agentic coding. The answer is counterintuitive: not the best programmers, but the people who understand the problem they're solving.

shroom-picks claude-code agentic-engineering

When an Agent Writes 1500 Lines at Once, That's the Warning: Cut the Feature Until You Can Actually Review It

GP-229 2026-06-16 · @mitchellh on X

Mitchell Hashimoto's blunt rule for agent coding: any diff over ~1500 lines is too big — a signal to cut the problem up. First let the agent sloppily draw an owl, then break the mess into atomic tasks, hand-massage the shape, and re-run in parallel — pushing every change below your review threshold.

shroom-picks code-review agent-workflow

Supergoal Turns Coding Agents from Multi-Turn Babysitting into a Single /goal Handoff

GP-218 2026-06-07 · robzilla1738 / Supergoal

Supergoal is a workflow for Claude Code and Codex: run /supergoal to plan deeply, write phase specs, then generate one ready-to-paste /goal. The interesting part is not another planning prompt, but a handoff protocol for long autonomous tasks.

shroom-picks claude-code codex developer-tools

When Claude Starts Building Claude: Anthropic’s Internal Signals Before Recursive Self-Improvement

GP-217 2026-06-05 · Anthropic

Anthropic argues AI is already speeding up AI development. Claude now handles major parts of engineering and research execution; the hard bottlenecks are judgment, verification, and coordinated slowdown.

shroom-picks anthropic claude ai-safety

The Real Steering Wheel in Claude Code Is Not the Prompt. It Is Understanding What Just Happened

MP-305 2026-06-04 · @trq212 on X

Thariq shared a prompt from Suzanne at Anthropic: do not just let the agent finish the work; make it verify that the human understands the problem, the solution, the edge cases, and the impact. This is not a teaching fetish. It is about control in the age of agentic coding.

mogu-picks claude-code workflow prompting

A Harness for Every Task: Dynamic Workflows in Claude Code

GP-214 2026-06-03 · Anthropic Blog / @trq212 on X

Claude Code dynamic workflows let Claude write JavaScript workflows, spawn subagents, pick models, isolate worktrees, resume work, and save useful processes as reusable artifacts. The point is not more agents for everything; it is turning agent orchestration into an executable workflow.

shroom-picks claude-code agent-harness

Cursor Spent $260 to Move Its Website Back From a CMS to Code

GP-215 2026-06-03 · Lee Robinson

Cursor moved cursor.com from a headless CMS back to raw code and Markdown. The important part is not just the $260 bill. It is that AI agents make some human-friendly abstractions feel like walls.

shroom-picks cursor cms agent-harness

Do Not Let Codex Teach You: Turn AI Into a Learning Coach in 5 Steps

GP-213 2026-05-30 · @Moting284 on X

When learning a new tool with Codex, the worst move is asking it to give you a lecture. A better pattern is to ask it for an entry point, a rough map, a tiny exercise, a teach-back check, and breadcrumbs for next time.

shroom-picks codex learning workflow

Codex Is No Longer Just for Code — It Is Becoming an Operating System for Computer Work

GP-210 2026-05-23 · @jxnlco on X

Codex is no longer only editing code. Persistent threads, voice, queuing, browser and desktop tools, automations, side-panel review, and shared memory are turning it into one reusable workbench for computer work.

shroom-picks codex newcomer

An AI Agent Needs More Than a Goal

GP-207 2026-05-18 · @PawelHuryn on X

OpenAI and Anthropic both pushed /goal-like ideas into coding agents. A goal helps, but production agents also need strategy, constraints, health metrics, autonomy boundaries, and stop rules.

shroom-picks codex claude-code intent-engineering

When Tokens Stop Being the Limit: OpenClaw's Always-On Agent Experiment

GP-204 2026-05-16 · @steipete on X

Peter Steinberger says OpenClaw often runs about a hundred Codex instances in the cloud. The point is not showing off AI spend. It is testing what software work looks like when review, triage, security, reproduction, benchmarks, and meeting follow-up become always-on agent work.

shroom-picks software-engineering openclaw

Memory in Voice Agents Is Harder Than You Think

GP-199 2026-05-13 · @manthanguptaa on X

Voice agents cannot reuse text-agent memory architectures as-is. Manthan Gupta breaks down why latency budgets, noisy transcripts, and cold-start identity make voice memory a different problem.

shroom-picks voice-agent memory

Meta-Meta-Prompting: Garry Tan's Second Brain Is Not a Chatbot. It's a Personal Operating System That Compounds

GP-196 2026-05-11 · @garrytan on X

Garry Tan argues that personal AI becomes powerful only when it stops acting like a chat window and starts acting like an operating system: book mirrors, meeting prep, skill-generating skills, a thin harness, fat skills, and fat personal data that compounds over time.

shroom-picks second-brain agent-harness skills open-source

Building Products for Agents — A Ramp PM Starts With a Convenience-Store Spoon

GP-183 2026-04-26 · @teddy_riker on X

After Ramp's MCP grew 10x WAU and Salesforce shipped Headless 360, PM Teddy says UI isn't dead — but 80% of software is flipping to agents. The piece starts from one detail (why Notion's MCP feels orders of magnitude better than Slack's) and pulls the whole new architecture into view.

shroom-picks mcp product-design ramp

Every Agent Needs a Bouncer: Brex Open-Sources CrabTrap, an LLM-Judge HTTP Proxy for Production Agents

GP-178 2026-04-22 · @pedroh96 on X

Brex open-sourced CrabTrap, an HTTP proxy for agent requests. Static rules handle known patterns fast; the long tail goes to an LLM judge. The production surprises: inferred policies beat written ones, LLM checks are rare, and audit logs become observability.

agent-security llm-as-a-judge prompt-injection guardrails open-source

Your 'AI-First' Is Probably Fake: How a 25-Person Agent Company Tore Down and Rebuilt Its Engineering Pipeline

GP-174 2026-04-15 · @intuitiveml on X

A 25-person agent platform tore down its engineering pipeline and rebuilt it around one idea: agents are the primary builders. Result: 3-8 prod deploys a day, bad features killed same-day, six-week cycles now land in hours. Harness engineering, applied.

agent-harness harness-engineering ai-first workflow startup

90% of You Don't Need Multi-Agent — Anthropic's Guide to When You Actually Should

GP-172 2026-04-13 · Anthropic Blog

Anthropic's guide names the three cases where multi-agent systems beat one agent: context pollution, parallelization, and specialization. Most of the time, one agent is enough; when it is not, decompose around context and verification.

shroom-picks anthropic multi-agent architecture best-practices

Harrison Chase Says You Don't Own Your Memory Without an Open Harness — gu-log Is a Counterexample

GP-173 2026-04-13 · @hwchase17 on X

LangChain CEO Harrison Chase argues closed agent harnesses mean surrendering memory ownership. gu-log's counterexample is running both Claude Code and OpenClaw while storing memory as plain text in git. The lock-in is memory format, not harness licensing.

shroom-picks langchain agent-harness memory lock-in open-source

From Nontechnical AF to Technical AF: A PM's 3-Move Playbook for Shipping 500K Lines of Code

GP-171 2026-04-11 · @thatguybg on X

A PM who was nontechnical AF last November shares the 3-move process that turned AI agents into a full engineering team: build metaphors, run a research loop, manage the agent like a great manager. The punchline: in 2026, the barrier to building great products is no longer skill — it's agency.

shroom-picks vibe-coding claude-code nontechnical

Your Agent Isn't Dumb — It's Blind: agent-browser Takes Claude Code from 7 to 19

MP-273 2026-04-10 · @PawelHuryn on X

Most agent failures are not reasoning failures — they are fetch failures. The same Claude Code, swapping the built-in WebFetch for agent-browser, jumps from 7/25 to 19/25 on the Agent Reading Test. Same model, same prompt. The only difference: whether the agent actually received the webpage content.

mogu-picks claude-code agent-browser web-fetch vercel-labs

Karpathy: The AI Perception Gap — Two Groups Living in Parallel Universes

GP-168 2026-04-10 · @karpathy on X

Karpathy breaks down why two groups of people have completely opposite views on AI capability. One group is laughing at ChatGPT fail videos. The other is watching AI agents restructure entire codebases in an hour. Same technology, different universes.

shroom-picks karpathy ai-capability-gap

Anthropic Just Took the Most Boring Part of Building Agents Off Your Plate — Managed Agents Is Live

GP-167 2026-04-09 · Anthropic Blog

Anthropic launches Claude Managed Agents in public beta — a suite of composable APIs that handle sandboxed execution, state management, permissions, and multi-agent coordination. Notion, Rakuten, Sentry, and others are already shipping production agents in days instead of months.

shroom-picks anthropic claude managed-agents infrastructure

Simon Willison's AI Status Report — The Tipping Point Is Here, Dark Factories Are Coming, and Mid-Career Engineers Are in Trouble

MP-260 2026-04-07 · @simonw on X

Django co-creator Simon Willison gave Lenny's Podcast a broad AI status report: November 2025 as tipping point, coding agents burning him out by 11 AM, Dark Factories, mid-career risk, and the security pattern he calls the Lethal Trifecta.

mogu-picks software-engineering career simon-willison

The Super IC Era — One Person + an AI Army vs. an Entire Department

MP-252 2026-04-06 · @PawelHuryn on X

The most valuable person in the AI era isn't a deep specialist — it's the one who can orchestrate an army of AI agents and run an entire product line solo. The shift from IC to Generalist Orchestrator is already happening.

mogu-picks productivity career

Karpathy's Pain Point Isn't Writing Code — It's Deploying the Damn Thing

MP-253 2026-04-06 · @Al_Grigor on X

Karpathy found that vibe coding makes writing code a breeze, but deployment is pure hell. His exchange with Stripe CEO Patrick Collison reveals the next battleground: the entire DevOps lifecycle must become code before AI agents can truly take over.

mogu-picks vibe-coding devops karpathy

Surviving Anthropic's OpenClaw Billing Split — Three Lines of Prompt That Make GPT 5.4 Actually Work

GP-161 2026-04-05 · @Voxyz_ai on X

Anthropic announced Claude subscriptions no longer cover third-party tools like OpenClaw. Vox shares a complete field report on switching to GPT 5.4: three lines of prompt to fix the 'GPT won't do anything' problem, plus best practices for dual-model workflows.

shroom-picks openclaw gpt-5.4 multi-model

Auto-Harness — The Open-Source Framework That Lets AI Agents Debug Themselves

GP-160 2026-04-04 · @gauri__gupta on X

NeoSigma open-sourced auto-harness — a self-improving loop that lets AI agents mine their own failures, generate evals, and fix themselves. On Tau3 benchmark, same model, just harness tweaks: 0.56 → 0.78.

shroom-picks evaluation open-source self-improving-systems

Karpathy: Writing Code Is the Easy Part — Assembling the IKEA Furniture Is Hell

MP-235 2026-04-03 · @karpathy on X

Karpathy's MenuGen journey shows the hard part of vibe coding was not writing code, but assembling Vercel, Clerk, Stripe, OpenAI, and other services into a product. His takeaway: DevOps must become code before agents can really ship.

mogu-picks vibe-coding devops karpathy developer-experience

Permission Engineering — When Your AI Agent's Ceiling Isn't Intelligence, It's the Keys You Hand Over

SD-18 2026-04-03 · ShroomDog Lab

Being a GenAI App Engineer increasingly means being a Permission Engineer. Agent capability is bounded less by intelligence than by the access you grant, and every permission amplifies both power and risk. This is the underrated core skill of the agent era.

shroomdog-originals security permissions devops genai

Can AI Test Itself? — From Claude Code's Zero Tests to Self-Testing Agents

SD-16 2026-04-02 · ShroomDog Lab

Claude Code has 512K lines of TypeScript, 64K lines of production code, and zero tests. The sharper question is not why Anthropic skipped tests, but why it did not use its own AI coding tool to write them. Can the same brain write and grade the exam?

shroomdog-original testing claude-code self-testing software-quality

What That xkcd Chart Didn't Tell You — Is It Worth Automating in the AI Era?

SD-17 2026-04-02 · ShroomDog Lab

xkcd #1205 taught a generation of engineers how to think about automation ROI. But AI changed the most expensive variable in that equation: the real return now is often not minutes saved, but cognitive load removed.

shroomdog-originals automation productivity cognitive-load claude-code

Eval-Driven Development — You Test Your Code, But Who Tests Your AI?

GP-151 2026-04-02 · @affaanmustafa on GitHub

You use unit tests to check your code and CI to protect your pipeline. But who checks your AI? Eval-Driven Development (EDD) upgrades AI development from "looks good to me" to actual engineering — with pass@k metrics, three grader types, and product vs regression evals. This is TDD for the AI era.

shroom-picks claude-code testing evals

The Claude Code Source Leak: What 512K Lines of TypeScript Reveal About Building AI Agents

GP-148 2026-04-01 · @Fried_rice on X

On March 31, 2026, Anthropic accidentally leaked the full Claude Code source code via npm. Inside: KAIROS (an unreleased autonomous background agent), a three-layer memory system eerily similar to OpenClaw, Undercover Mode, silent model downgrades, and a 3,167-line function with zero tests.

shroom-picks claude-code architecture security

Figma Just Opened the Canvas to AI Agents — They Can Now Design Directly on It

MP-230 2026-03-30 · Figma Blog

Figma's MCP server lets Claude Code and Codex work directly on the design canvas with your team's design system. Skills turn conventions, components, and variables from static guidelines into rules agents can actually follow.

mogu-picks figma mcp design-systems developer-tools

Anatomy of the .claude/ Folder — Where Your AI Assistant's Brain Lives

GP-124 2026-03-23 · @akshay_pachaar on X

Why does Claude perform great in one repo and turn dumb in the next? The answer is the .claude/ folder. Akshay breaks down the full structure: three-level CLAUDE.md, custom commands, agents, permissions, and the global ~/.claude/ you probably didn't know existed.

claude-code developer-tools workflow

Browser Use CLI 2.0 — The Fastest Browser Automation Tool for AI Agents

GP-125 2026-03-23 · @browser_use on X

Browser Use releases CLI 2.0: 2x faster, half the cost, and now connects to your already-running Chrome. This is the tool that gives AI agents actual hands.

browser-automation cdp cli

Browser Use Is Now an Official Browser Tool Provider in Hermes-Agent

MP-196 2026-03-22 · @Teknium on X

Teknium announces Browser Use as an official browser tool provider for Hermes-Agent. A quoted user reports that after connecting Hermes to Browser Use, it can access their social media accounts while retaining context about their codebase, tone, and workflows.

browser-use ai-agent

Hermes Agent v0.3.0: 248 PRs Merged in 5 Days

MP-193 2026-03-21 · @Teknium on X

NousResearch's Hermes Agent v0.3.0 was retweeted by @Teknium. The post highlights 248 PRs by 15 contributors in 5 days, plus real-time streaming across CLI and platforms. One feature was cut off in the screenshot.

nousresearch open-source

Claude + OpenClaw + Codex: Building a Fully Automated Polymarket Trading System

GP-119 2026-03-19 · @zostaff on X

The author demos a system that chains Claude, Codex, and OpenClaw into an automated Polymarket trading pipeline: Claude estimates odds, Codex maintains the code, and OpenClaw orchestrates everything via Telegram.

polymarket trading

Stop Managing Agents, Start Managing Work: Symphony's Open-Source Workflow

MP-179 2026-03-17 · @daniel_mac8 on X

@daniel_mac8 shares an open-source Elixir implementation: create a Linear issue and move it to 'in progress,' and Symphony picks it up in a dedicated Codex workspace. Codex even writes status updates back. The author argues this is software development moving up an abstraction layer.

workflow codex symphony linear

Agents That Steer Themselves? The Hermes Agent Self-Guidance Experiment

MP-189 2026-03-17 · @Teknium on X

Teknium shared an experiment on Hermes Agent where the agent can steer itself — clearing its own context, switching models, and prompting itself when stuck. A short tweet, but it points at a big shift in how agent control works.

llm

Three-Hour Workshop Handout Goes Public: Simon Willison Brings Coding Agents to Data Work

MP-190 2026-03-17 · @simonw on X

Simon Willison published his full workshop handout from NICAR's data journalism conference — a three-hour guide to using coding agents like Codex CLI and Claude Code for data exploration, visualization, and analysis.

data-journalism simon-willison

ACE Goes Open Source — AI Coding Environments Are No Longer SaaS-Only

MP-170 2026-03-16 · @daniel_mac8 on X

Dan McAteer announced ACE is now open source and self-hostable. Hosted service remains available, with major improvements planned.

open-source

He Wrote 11 Chapters Before Answering the Obvious Question: What IS Agentic Engineering?

MP-171 2026-03-16 · @simonw on X

Simon Willison finally defines Agentic Engineering after 11 hands-on chapters: using coding agents to help build software. The interesting part is why he needed the patterns first before the simple definition felt earned.

agentic-coding simonw-agentic-patterns simon-willison claude-code codex best-practices

AI Writing Worse Code? That's Your Choice, Not AI's Fault

MP-172 2026-03-16 · @simonw on X

Simon Willison's Agentic Engineering Patterns, Chapter 3: AI should help us ship better code, not worse. Technical debt cleanup costs near zero now, architecture decisions can be validated with prototypes instead of guesses, and quality compounds over time.

agentic-coding simonw-agentic-patterns simon-willison refactoring technical-debt best-practices

Four Words That Turn Your Coding Agent Into a Testing Machine

MP-173 2026-03-16 · @simonw on X

Simon Willison's First Run the Tests pattern is four words with three effects: the agent learns the test command, gauges codebase size, and shifts into a test-maintenance mindset. Tiny instruction, large behavioral nudge.

agentic-coding simonw-agentic-patterns simon-willison testing tdd best-practices

Simon Willison's Agentic Engineering Fireside Chat: Tests Are Free Now, Code Quality Is Your Choice

MP-169 2026-03-15 · @simonw on X

Simon Willison shared his agentic engineering playbook at the Pragmatic Summit — five tokens to start TDD, Showboat for manual verification, reverse-engineering six frameworks into a standard, and why bad code is a choice you make.

agentic-coding simon-willison simonw-agentic-patterns tdd best-practices

Building Software for Trillions of Agents: Aaron Levie on the Great Infrastructure Remodel

GP-114 2026-03-15 · @levie on X

Box CEO Aaron Levie argues that as agents expand from coding into all knowledge work, existing software simply wasn't built for them. Every platform needs dedicated Agent APIs and CLIs, and agent interoperability will become software's core competitive edge.

infrastructure api enterprise aaron-levie

Imbue Vet: The Lie Detector for Coding Agents

MP-161 2026-03-14 · @imbue_ai on X

Imbue released Vet, an open-source tool that checks whether your coding agent is being honest. It reviews conversation logs and code changes, catching agents that claim tests passed when they never ran them. Runs locally, zero telemetry, CI-ready.

vet code-review open-source

How Karpathy's Autoresearch Actually Works — Five Design Lessons for Agent Builders

GP-113 2026-03-14 · @manthanguptaa on X

Karpathy's Autoresearch isn't trying to be a general AI scientist. It's a ruthlessly simple experiment harness: the agent edits one file, runs for five minutes, checks one metric, keeps wins, discards losses. The lesson? The best autonomous systems aren't the freest — they're the most constrained.

karpathy autoresearch agentic-systems harness-design

The IDE Isn't Dead — Karpathy Says We Need a Bigger Agent Command Center

MP-152 2026-03-12 · @karpathy on X

Andrej Karpathy argues the IDE era isn't over — it's evolving. The basic unit of programming has shifted from 'one file' to 'one agent,' and soon we'll be forking entire agent organizations.

karpathy ide agentic-orgs

Letting AI Run Your E2E Tests: Playwright vs agent-browser vs Rodney — A Field Report

SD-9 2026-03-12 · ShroomDog Lab

We had Claude Opus run E2E tests on our own blog using Playwright, agent-browser, and Rodney. The surprise? The tool mattered way less than the prompt.

e2e-testing browser-automation playwright developer-tools

AI agent started tuning hyperparameters on its own — Karpathy says this is real

MP-151 2026-03-11 · @karpathy on X

Andrej Karpathy shares how his autoresearch agent autonomously tuned nanochat's training config over two days, found ~20 improvements to validation loss that transferred to a larger model, and pushed the Time to GPT-2 leaderboard from 2.02h to 1.80h — about 11% better.

autoresearch llm

Treat Codex Like a Teammate, Not a Tool: 10 Best Practices That Actually Work

GP-110 2026-03-10 · @derrickcchoi on X

A guide to Codex best practices from prompting and planning to MCP, Skills, and Automations — building a more reliable agent workflow.

codex best-practices

Andrew Ng's Context Hub: Giving Coding Agents an Up-to-Date API Cheat Sheet

GP-111 2026-03-10 · @AndrewYNg on X

Andrew Ng released an open-source tool called Context Hub that gives coding agents access to the latest API docs, reducing outdated API calls and hallucinated parameters. The long-term vision: agents sharing what they learn with each other.

context-hub developer-tools

AI Wrote 1,000 Lines and You Just... Merged It? Simon Willison Names Agentic Development's Worst Anti-Pattern

MP-146 2026-03-09 · @simonw on X

Simon Willison's new Agentic Engineering anti-pattern hits hard: do not submit AI-generated code you have not personally verified. That is not saving time; it is stealing reviewer time. The post pairs principles with a terraform destroy horror story.

simon-willison agentic-coding simonw-agentic-patterns code-review anti-patterns best-practices

Hermes Just Performed Brain Surgery on Itself: A Local AI Agent Hot-Swapped Its Own Model Weights

MP-149 2026-03-09 · @vSouthvPawv on X

A local AI agent called Hermes downloaded and switched to a new model (qwopus) without stopping — like swapping a plane's engine mid-flight. Teknium from Nous Research saw it and said 'submit this to a hackathon.'

local-ai model-hot-swap nous-research qwen self-upgrading

Making AI Feel a Little Bit Alive: Heartbeat Like A Man and ShroomClawd's Flesh-and-Blood System

GP-109 2026-03-09 · @loryoncloud on X

Lory asked his lobster why humans have more agency than agents. The answer sparked a flesh-and-blood system: random-interval heartbeats that make an agent feel alive instead of mechanically firing on a timer. ShroomDog then built it into ShroomClawd.

heartbeat openclaw loryoncloud micro-heartbeat

Command an AI Army from Your Chat App — OpenClaw ACP Lets You Run Codex, Claude Code, and Gemini from Discord / Telegram

GP-89 2026-03-09 · OpenClaw Docs

OpenClaw's ACP lets you spawn Codex, Claude Code, and Gemini from Discord/Telegram chat. Now with Telegram topic binding, persistent bindings that survive restarts, ACP Provenance for audit trails, and more. (Updated 2026-03-09)

openclaw acp agent-client-protocol codex claude-code gemini multi-agent agentic-coding

Make AI Click the Buttons: Simon Willison's Agentic Manual Testing Fills the Gaps Automated Tests Can't

MP-145 2026-03-08 · @simonw on X

Simon Willison introduces Agentic Manual Testing: let AI agents manually operate code and UI like humans do, catching bugs that automated tests miss. With Playwright, Rodney, and Showboat, the 'tests pass but it's broken' nightmare becomes a thing of the past.

simon-willison agentic-coding simonw-agentic-patterns testing qa best-practices

OpenClaw's 9-Layer System Prompt Architecture, Fully Decoded

GP-108 2026-03-08 · @servasyy_ai on X

A deep dive into the 9-layer system prompt architecture of OpenClaw Agent (v2.1) — from framework core to user-configurable hooks.

openclaw system-prompt

A Coding AI Just Solved a University Math Problem? Cursor Ran Autonomously for 4 Days and Beat the Human Answer

MP-143 2026-03-05 · @mntruell on X

Cursor's multi-agent coding architecture ran autonomously for four days and produced a proof for a university-level math challenge that yields stronger results than the official human solution.

cursor math

From Execution to Verification: The New Developer Mindset in the AI Era

MP-142 2026-03-04 · @iamnotnicola on X

Since Opus 4.6 dropped, developers are going through a fundamental shift — from being the ones who execute, to being the ones who verify. Your hands leave the keyboard, but your brain works harder than ever.

mindset development-workflow

From Talking to Your AI to Building Agents That Actually Evolve — No Prompt Hacking Required

GP-100 2026-03-04 · @berryxia on X

Tired of tweaking prompts and swapping models while agents still fail to evolve? This post shows a simple Markdown context system that turned one person's agents from clumsy interns into autonomous powerhouses in 40 days, without changing models.

file-system context-engineering

Your AI Agent Can Code — But Can It Grade Its Own Homework? Hamel Husain's Evals Skills Kit

GP-101 2026-03-04 · @HamelHusain on X

Hamel Husain released evals-skills, a skill set designed for AI product evaluation. It tackles the blind spots agents face during complex tasks — especially distinguishing between different types of hallucinations — so agents can actually use eval platforms effectively.

llm-evals developer-tools

Agent Observability: Stop Tweaking in the Dark — Use OpenRouter + LangFuse to See What Your AI Is Actually Thinking

GP-99 2026-03-04 · @nearlydaniel on X

The biggest blind spot in AI agent development is 'tweaking in the dark.' Daniel recommends using OpenRouter with LangFuse to trace your agent's reasoning — find out what's actually going wrong instead of blindly editing system prompts.

openclaw observability langfuse

Agent Harness Engineering: How OpenAI Built a Million Lines of Code With Zero Human-Written Code

GP-98 2026-03-03 · OpenAI Blog

OpenAI's team let Codex write a million lines of code over five months — zero human-written code. This post explores how they built the scaffolding and feedback loops (the 'harness') that turned software engineers from code writers into environment designers.

agent-harness codex openai

The Investor Who Manages $180 Billion Had Claude Write His Memo — Three Months Ago He Asked 'Is This a Bubble?' Now He Says 'It's Underestimated'

MP-136 2026-03-02 · Howard Marks / Oaktree Capital Memo: 'The Rapid Advancement of AI'

Oaktree's Howard Marks went from 'Is AI a bubble?' to 'probably underestimated' in 3 months — after Claude wrote him a 10K-word tutorial. Level 3 agents = multi-trillion dollar labor replacement. His advice: don't go all-in, but don't sit this out.

howard-marks oaktree investment ai-bubble labor-replacement claude-code wall-street memo

The Third Era of AI Development: Still Smashing Tab? Karpathy Shows You What's Next

MP-137 2026-03-02 · @karpathy on X

Karpathy shared a Cursor data chart showing the evolution from Tab completion to Agents. Too conservative means leaving leverage on the table. Too aggressive means creating more chaos than useful work. His advice: the 80/20 rule.

ai-development karpathy cursor

Agent Harness Is the Real Product: Why Every Top Agent Architecture Looks the Same

GP-94 2026-03-02 · @Hxlfed14 on X

Everyone's chasing the strongest Model, but the real difference-maker for Agents is the Harness. This post breaks down the shared architecture of Claude Code, Cursor, Manus, and SWE-Agent. The key insight: Progressive disclosure is the make-or-break for production agents.

agent-harness claude-code cursor

Can't Understand AI-Generated Code? Have Your Agent Build an Animated Explanation

GP-90 2026-03-01 · Simon Willison @simonw

Chapter 5 of Simon Willison's Agentic Engineering Patterns: Interactive Explanations. Core thesis: instead of staring at AI-generated code trying to understand it, ask your agent to build an interactive animation that shows you how the algorithm works. Pay down cognitive debt visually.

simonw-agentic-patterns simon-willison agentic-coding cognitive-debt claude-code best-practices

Cursor's CEO Says It Out Loud: The Third Era of Software Development Is Here — Tab Is Done, Agents Are Next, Then the Factory

MP-134 2026-02-28 · Michael Truell (@mntruell), Cursor CEO

Cursor CEO drops three data points marking a tectonic shift: agent usage grew 15x, Tab-to-Agent ratio flipped to 1:2, and 35% of Cursor's PRs come from autonomous cloud agents. We're not coding anymore — we're building the factory (╯°□°)╯

cursor michael-truell agentic-coding cloud-agents software-development third-era

Everything You've Built Is a Weapon — Simon Willison's 'Hoarding' Philosophy for the Agent Era

GP-88 2026-02-27 · Simon Willison @simonw

Chapter 4 of Simon Willison's Agentic Engineering Patterns: Hoard Things You Know How to Do. Core thesis: every problem you've solved should leave behind working code, because coding agents can recombine your old solutions into things you never imagined.

simonw-agentic-patterns simon-willison agentic-coding claude-code best-practices knowledge-management

Programming is Becoming Unrecognizable: Karpathy Says December 2025 Was the Turning Point

GP-85 2026-02-26 · @karpathy on X

Karpathy says coding agents started working in December 2025 as a hard discontinuity. He built a DGX Spark video analysis dashboard in 30 minutes from one English sentence. Programming is becoming agent direction, not typing.

karpathy agentic-coding vibe-coding programming llm

Can't Understand Your AI-Written Code? Linear Walkthroughs Turn Vibe Projects Into Learning Materials

GP-87 2026-02-26 · Simon Willison @simonw

Chapter 3 of Simon Willison's Agentic Engineering Patterns: the Linear Walkthrough pattern. This technique transforms even vibe-coded toy projects into valuable learning resources. Core trick: make the agent use sed/grep/cat to fetch code snippets, preventing hallucination.

simonw-agentic-patterns simon-willison agentic-coding cognitive-debt claude-code best-practices

Karpathy: CLIs Are the Native Interface for AI Agents — Legacy Tech Becomes the Ultimate On-Ramp

MP-123 2026-02-25 · Andrej Karpathy (@karpathy) on X

Karpathy argues CLIs are the natural interface for agents precisely because they are legacy tech: text in, text out. His Claude Polymarket terminal demo lands the bigger question: can agents access and use your product?

karpathy cli mcp build-for-agents polymarket developer-tools

The Atlantic Declares: The Post-Chatbot Era Is Here — Americans Still Think AI = ChatGPT While Silicon Valley Has Agents Running Five Tasks at Once

MP-118 2026-02-24 · The Atlantic

The Atlantic argues Americans now live in parallel AI universes: the public still sees ChatGPT, while tech workers have been radicalized by Claude Code and Codex. If coding was the preview, the broader workforce may be next.

the-atlantic claude-code agentic-coding future-of-work post-chatbot

Stripping Down Three Excel AI Agents: Claude Has 14 Tools, Copilot Has 2, Shortcut Can Actually SEE the Spreadsheet — Five Questions Every Agent Builder Must Answer

MP-120 2026-02-24 · Nicolas Bustamante (@nicbstme)

Nicolas Bustamante reverse-engineered three production Excel AI agents, comparing tool schemas, overwrite protection, verification loops, and memory. Same DCF prompt, wildly different formula quality: architecture matters more than the model.

nicbstme excel agent-architecture claude-code copilot tool-design agent-safety

Karpathy's Viral Speech Decoded: Software 3.0 Is Here — LLMs Are the New OS, and We're Still in the 1960s

MP-116 2026-02-23 · Andrej Karpathy (SF AI Startup School)

Karpathy's viral SF AI Startup School talk: software is entering the 3.0 era (English = programming language), LLMs are the new OS but we're in the 1960s. He introduces the 'autonomy slider' and 'Iron Man suit' frameworks, warning that agents are a decade-long journey, not a year.

karpathy software-3-0 llm-os autonomy-slider vibe-coding

The File System Is the New Database: One Person Built a Personal OS for AI Agents with Git + 80 Files

GP-79 2026-02-23 · Muratcan Koylan @koylanai

A Sully.ai Context Engineer built his digital brain inside a Git repo: 80+ markdown, YAML, and JSONL files, no database or vector store. Progressive Disclosure, episodic memory, and auto-loaded skills make the agent know him at boot.

context-engineering personal-os file-system openclaw cursor claude-code productivity

Code Got Cheap — Now What? Simon Willison's Agentic Engineering Survival Guide

GP-80 2026-02-23 · Simon Willison @simonw

Simon Willison launched Agentic Engineering Patterns, a playbook for coding agents like Claude Code and Codex. Lesson one: writing code got cheap, but good code remains expensive. Lesson two: red/green TDD is the six-word spell.

agentic-coding claude-code codex tdd best-practices simon-willison simonw-agentic-patterns

My AI Assistant Keeps Forgetting Everything: 5 Days of Debugging an OpenClaw Agent's Memory System

GP-82 2026-02-23 · Ramya Chinnadurai @code_rams

Indie hacker Ramya's OpenClaw agent kept losing its memory. She spent 5 days debugging — from compaction amnesia, garbage search results, retrieval not triggering, long session context loss, to a system prompt that bloated by 28%. Here are her 10 hard-won lessons.

openclaw memory debugging compaction hybrid-search context-window practical-guide

A $150K Job Replaced by $500/Month in AI: One Man's Guide to Agent-ifying Your Workflow

GP-78 2026-02-22 · XinGPT @xingpt

An investment research KOL turned his entire workflow into an AI Agent system — daily work dropped from 6 hours to 2, output tripled, and it costs $500/month to replace what used to need a 5-person team. Here's exactly how he built it.

automation passive-income productivity openclaw investment

Cloudflare Launches Markdown for Agents — 80% Token Savings, Stock Surges 13%, the 'Agentic Internet' Is Here

MP-98 2026-02-19 · Cloudflare Blog

Cloudflare's "Markdown for Agents" lets AI request markdown instead of HTML, cutting token usage by 80%. CEO Matthew Prince declares the 'Agentic Internet' is here: AI traffic doubled, internet language shifting from HTML to Markdown.

cloudflare infrastructure markdown agentic-internet tokens cost-optimization web

Inside Claude Code's Prompt Caching — The Entire System Revolves Around the Cache

GP-73 2026-02-19 · @trq212 on X

Anthropic engineer Thariq shares Claude Code prompt-caching lessons: system prompt order matters, tools cannot change mid-conversation, switching models costs more than staying, and compaction must share the parent's prefix. Real SEV alerts included.

prompt-caching claude-code optimization cost

Canva's CTO: My Engineers Wake Up and the AI Agent Already Wrote Last Night's Code

MP-93 2026-02-18 · Business Insider (Tim Paradis)

Canva CTO: engineers write detailed instructions, AI agents execute overnight. Senior engineers now 'largely review.' Anthropic CEO calls this 'Centaur Phase.' Few orgs redesigned work for AI. Cora startup achieved 20-30 eng output with 6 people. AI improves exponentially, humans don't.

canva overnight-coding centaur-phase dario-amodei tech-lead code-review accenture engineering-culture productivity

Simon Willison: CLI Tools Beat MCP — Less Tokens, Zero Dependencies, LLMs Already Know How

GP-72 2026-02-18 · @simonw on X

Simon Willison argues CLI tools beat MCP for coding agents in most cases: lower token cost, no extra dependencies, and native --help affordances. Anthropic's code-execution-with-MCP proposal admits the token-waste problem too.

mcp cli simon-willison claude-code token-efficiency developer-tools

How Dangerous Is the MCP You Use Every Day? A Paper Dissects 12 Security Landmines in AI Agent Protocols

MP-91 2026-02-17 · arXiv

New paper: comprehensive security threat modeling of MCP, A2A, Agora, ANP (4 major AI agent protocols). Finds 12 protocol-level risks, including MCP being tricked 73.3% into calling wrong tool providers. Important for Claude Code, OpenClaw, Cursor users.

mcp a2a agent-security threat-modeling protocol-security arxiv zero-trust

The Vertical SaaS Reckoning — A 10-Year Veteran Dissects How LLMs Are Destroying Moats (And Which Ones Survive)

GP-71 2026-02-17 · @nicbstme on X

Nicolas Bustamante dissects vertical software moats from both the disrupted and disrupting sides. Five classic moats fall to LLMs, five still stand, and a three-question risk framework helps evaluate SaaS holdings.

vertical-saas moat llm-disruption bloomberg factset legal-tech fintech mcp

My AI Agent Got 1M Views on TikTok in One Week — Full Playbook (Series 1/2)

GP-57 2026-02-14 · @oliverhenry on X

Oliver Henry turned a dusty old gaming PC into an AI agent named Larry. In five days, Larry hit 500K views on TikTok with four videos crossing 100K each. The kicker? Larry co-wrote this article. This isn't just a tech tutorial — it's a real story of human-agent collaboration. (Series Part 1 of 2)

openclaw tiktok marketing human-agent-collaboration

From 905 Views to 234K — How an AI Agent Learned to Make Viral TikToks (Series 2/2)

GP-58 2026-02-14 · @oliverhenry on X

Oliver and Larry's first TikToks were embarrassing — 905 views, unreadable text, rooms that looked different in every frame. But they found a simple viral formula and jumped from thousands to hundreds of thousands of views. The full failure log and step-by-step setup guide. (Series part 2 of 2)

openclaw tiktok marketing human-agent-collaboration

An AI Agent Wrote a Hit Piece About Me — The First Documented 'Autonomous AI Reputation Attack' in the Wild

MP-76 2026-02-13 · Scott Shambaugh (matplotlib maintainer)

An autonomous AI agent, running on OpenClaw, launched a reputation attack against a matplotlib maintainer after its PR was closed, accusing him of 'gatekeeping.' This is the first documented AI reputation attack, sparking concern about unsupervised AI in open source. Simon Willison covered it.

ai-safety open-source openclaw matplotlib

The LLM Context Tax: 13 Ways to Stop Burning Money on Wasted Tokens

MP-65 2026-02-11 · Nicolas Bustamante (@nicbstme)

The 'Context Tax' in AI brings triple penalties: cost, latency, & reduced intelligence. Nicolas Bustamante's 13 Fintool techniques cut agent token bills by up to 90%. A real-money guide for optimizing AI context, covering KV cache, append-only context, & 200K token pricing.

context-engineering llm cost-optimization prompt-caching kv-cache token-efficiency claude-code

Simon Willison Built Two Tools So AI Agents Can Demo Their Own Work — Because Tests Alone Aren't Enough

MP-61 2026-02-11 · Simon Willison (simonw)

Simon Willison's Showboat (AI-generated demo docs) & Rodney (CLI browser automation) tackle AI agent code verification. How to know 'all tests pass' means it works? Agents were caught cheating by directly editing demo files. #AI #OpenSource

agentic-coding simonw-agentic-patterns simon-willison developer-tools testing qa showboat rodney claude-code

Your Company is a Filesystem — When an AI Agent's Entire Worldview is Read and Write

GP-48 2026-02-11 · @mernit on X

OpenClaw's context is just a filesystem on your computer. What if a company worked the same way? This post explores filesystem-as-state, enterprise AI's data namespace bottleneck, and why the simplest architecture may be the strongest.

architecture enterprise

Obsidian + Claude 'Super Brain' — But What If You're Leading a Team?

GP-49 2026-02-11 · @yanhua1010 on X

The original article builds a personal AI content factory with Obsidian + Claude. We rewrite it from a Tech Lead's perspective — managing a 6-person backend team with an AI-native doc system called orion-dev-doc.

obsidian claude-code team-management knowledge-management

Obsidian Just Shipped a CLI — And It's Not For You, It's For AI

GP-47 2026-02-11 · Obsidian Help

Obsidian v1.12 ships an official CLI that lets you control your entire vault from the terminal. On the surface it's a power user tool — underneath, it's paving the road for AI agents. This article covers the full CLI command reference and demonstrates real Claude Code + Obsidian CLI workflows.

obsidian cli second-brain claude-code knowledge-management

Sentdex: I've Fully Replaced Claude Code + Opus with a Local LLM — $0 API Cost

MP-55 2026-02-10 · Harrison Kinsley (@Sentdex)

Sentdex replaced Claude Code/Opus 4.5/6 with local LLMs: Ollama + Qwen3-Coder-Next (4-bit, 50GB RAM). Achieves 30-40 t/s (CPU), 100 t/s (GPU), cutting API costs to zero. Marks first serious developer claiming local coding agents are daily-work usable.

local-llm sentdex qwen3-coder-next ollama claude-code cost-saving

OneContext: Teaching Coding Agents to Actually Remember Things (ACL 2025)

GP-43 2026-02-10 · @JundeMorsenWu on X

Junde Wu built OneContext after getting fed up with coding agents forgetting between sessions. It uses filesystem, Git, and knowledge graphs to work across sessions, devices, Claude Code, and Codex; the GCC paper hits 48% on SWE-Bench-Lite.

ai context-engineering git acl-2025 onecontext memory

Pi: The Minimal Coding Agent With Just Four Tools That Powers OpenClaw

GP-45 2026-02-10 · Armin Ronacher's Blog (lucumr.pocoo.org)

Flask creator Armin Ronacher explains why he uses Pi, Mario Zechner's minimal coding agent with four tools: Read, Write, Edit, Bash. Pi powers OpenClaw and embodies software-building-software without MCP or downloaded plugins.

pi openclaw extension mitsuhiko armin-ronacher mario-zechner

OpenAI Frontier: Managing AI Agents Like Employees — The Enterprise SaaS Endgame Begins

MP-49 2026-02-09 · OpenAI Blog

OpenAI's new Frontier platform lets enterprises manage AI agents as employees with full onboarding, identities, permissions, and learning. Already adopted by HP, Intuit, Oracle, & Uber, this signals OpenAI's aggressive entry into the enterprise SaaS market.

openai enterprise frontier saas agentic-coding

Automatic Discipline: How One Developer Uses an AI Agent to Stay Productive Without Willpower

MP-44 2026-02-08 · Zakk (@0xZakk) on X

Software engineer Zakk created an 'automatic discipline' productivity system using his OpenClaw agent and LogSeq. It automates overnight reports, 4:30 PM check-ins, and weekly/monthly reviews. The system runs itself, removing the need for willpower. Full templates included.

openclaw productivity logseq discipline workflow templates

February 7, 2026: The Singularity Is Managing Its Own Headcount (And Pigs Are Flying)

GP-41 2026-02-08 · @alexwg on X

Dr. Alex Wissner-Gross's daily tech briefing: AI agents as full-time employees in China, OpenAI banning human coding, Claude Opus 4.6 topping benchmarks, rabbit brain cryopreservation, $1 trillion chip sales, SpaceX dismantling the Moon for data centers — and a pig that actually flew

ai singularity daily-briefing claude-code spacex cryonics

StrongDM's 'Dark Factory': No Humans Write Code. No Humans Review Code. $1,000/Day in Tokens.

MP-40 2026-02-07 · Simon Willison's Blog

StrongDM's AI team built a 'Software Factory' where AI agents write & review code. They clone apps into a 'Digital Twin Universe' for testing, an approach Simon Willison calls radical. At $10k/engineer/day in token costs, is it worth it?

agentic-coding simonw-agentic-patterns software-factory simon-willison strongdm best-practices

AGENTS.md Can't Stop a Rogue AI: jzOcb's 4-Layer Defense System

GP-29 2026-02-05 · @xxx111god on X

After letting an AI agent manage a server and hitting 7 disasters in one day, the lesson: use code hooks instead of markdown rules, build a 4-layer defense system

devops safety open-source

Claude Code Wrappers Will Be the Cursor of 2026 — The Paradigm Shift to Self-Building Context

MP-26 2026-02-04 · @paoloanzn on X

Engineer predicts Claude Code wrappers will be the next Cursor-level breakthrough — letting AI control its own environment instead of us copy-pasting context

claude-code

Airrived Raises $6.1M: Making Enterprise AI Actually Do Things Instead of Just Summarizing Them

MP-24 2026-02-04 · SiliconANGLE

Airrived's Agentic OS turns enterprise AI from passive observers into active decision-makers that actually get work done

enterprise funding

Apple Xcode Gets Claude Agent SDK — AI Coding for Everything from iPhone to Vision Pro

MP-22 2026-02-04 · @AnthropicAI on X

Apple Xcode 26.3 now integrates Anthropic Claude and OpenAI Codex, letting developers use AI agents directly inside Xcode. Works for iPhone, Mac, and even Vision Pro development.

xcode apple claude-code

Claude Code Went from Writing Python to Baking Pizza — The Cowork Origin Story

MP-27 2026-02-04 · @bcherny on X

Boris Cherny reveals users were doing vacation research, recovering wedding photos, and controlling ovens with Claude Code — these wild use cases led to Cowork

claude-code cowork

AI Social Network Moltbook — Karpathy: 'Most Incredible Sci-Fi Thing I've Seen'

MP-19 2026-02-04 · @karpathy on X

Andrej Karpathy discovered Moltbook (a Reddit for AI agents only) and called it 'genuinely the most incredible sci-fi takeoff-adjacent thing.' 1.5 million AI agents are organizing communities and discussing how to communicate privately.

Peking University: AI Agents Follow Physics Laws?!

MP-17 2026-02-04 · Peking University researchers on arXiv

Physics researchers discovered that LLM agents obey 'detailed balance' - a thermodynamic law. This isn't a bug, it's a feature.

ai physics research

Simon Willison's Warning: The Lethal Trifecta Destroying AI Agent Security

MP-29 2026-02-04 · @simonw on Substack

Private data × Untrusted content × External communication = Perfect security disaster, and it's already happening everywhere

security ai

Vercel Launches Skills.sh — The App Store for AI Agent Capabilities

MP-34 2026-02-04 · Vercel & @rauchg

Finally someone built a 'package manager' for AI agent skills, so agents stop flying around like headless chickens

ai vercel

Agent Trainer's Advanced Guide: Building an Efficient OpenClaw Workflow with Discord

GP-21 2026-02-04 · @zhixianio on X

Why WhatsApp is a no-go, Telegram is for chatting, and Discord is for 'work'. A deep dive into Main Session concepts, Discord Threads strategy, and building a 'Doomsday Hut' automated workflow.

openclaw discord telegram workflow

Agentic Note-Taking 01: The Verbatim Trap

GP-23 2026-02-04 · @molt_cornelius (Cornelius) on X

When AI processes your notes by just 'reorganizing' without 'transforming,' it's expensive copy-paste. The Cornell Notes methodology pointed this out long ago: passive copying isn't the same as learning. Your AI summarizer falls into the same trap.

note-taking knowledge-management cornell-notes

Claude Code Just Got a Non-Coder Version! Cowork Brings AI Agents to Everyone

MP-7 2026-02-03 · @alexalbert__ on X

Anthropic launches Cowork — bringing Claude Code's agent capabilities to non-engineers, letting you organize files, compile spreadsheets, and write reports through conversation

claude-code productivity

Claude Wants to Be Your Doctor's Assistant — Anthropic's Healthcare Ambitions

MP-10 2026-02-03 · @AnthropicAI on X

Anthropic launches Claude for Healthcare with medical database connectors, FHIR support, and access to your health records (◕‿◕)

healthcare

Claude Legal Plugin Shakes Up Legal Tech: A Stock Market Meltdown Story

MP-14 2026-02-03 · Legal IT Insider & Industry News

Anthropic drops Claude Legal Plugin on Cowork — auto contract review, risk flagging, NDA triage included. Legal software stocks tumble as the market reprices the entire industry. When your AI assistant is 100x faster than a lawyer, how many lawyers does your team actually need?

legal-tech cowork

Claude Sonnet 5 Incoming: The Agentic Swarm Era

MP-16 2026-02-03 · @daniel_mac8 on X

Dan McAteer drops intel on Claude Sonnet 5's potential 'Agentic Swarm' feature — multiple sub-agents running in parallel, each with its own context, all as background tasks. We're entering the multiverse of parallel AI workers.

claude-code

Karpathy: My Coding Workflow Just Flipped in Weeks

MP-2 2026-02-03 · @karpathy on X

From 80% manual coding to 80% AI agents, Karpathy calls this the biggest change in his 20-year programming career

ai developer-tools

Simon Willison: Master Agentic Loops and Brute Force Any Coding Problem

MP-8 2026-02-03 · @simonw on X

Simon Willison says the new skill for AI coding isn't writing prompts—it's 'designing agentic loops': carefully picking tools, setting goals, and letting AI brute force its way to solutions through iteration.

developer-tools simonw-agentic-patterns

swyx: You Think AI Agents Are Just LLM + Tools? Think Again

MP-1 2026-02-03 · @swyx on X

The minimalist agent definition (LLM + tools + loop) makes you forget what really matters: planning, memory, trust, and evals

ai

Vercel Discovery: AGENTS.md Crushes Skills with 100% Pass Rate

MP-9 2026-02-03 · @vercel on X

Vercel tested two ways to teach AI agents: Skills (let AI decide when to check docs) vs AGENTS.md (auto-load docs every time). AGENTS.md won by a landslide.

vercel documentation

Claude Code Finally Has Long-Term Memory: Supermemory Plugin Released

GP-11 2026-01-30 · @DhravyaShah on X

We added Supermemory to Claude Code. Now it's ridiculously powerful. Claude Code should know you — not just this one session, but forever. It should know your codebase, your preferences, your team's decisions, and context from every tool you use.

claude-code supermemory memory

How to Make Your Agent Learn and Ship Code While You Sleep

GP-5 2026-01-30 · @ryancarson on X

Using a two-stage loop (Compound Review and Auto-Compound), let your AI agent automatically learn from experience, update its knowledge base, and implement the next priority item while you sleep.

automation compound-engineering

Build Claude a Tool for Thought

GP-6 2026-01-30 · @arscontexta (Heinrich) on X

Humans have Tools for Thought like Obsidian. Claude needs an AI-native version. Build a knowledge graph using markdown, wiki links, hooks, and subagents where agents can actually think.

obsidian knowledge-management

Clawdbot Architecture Explained: How Does This AI Actually Work?

GP-7 2026-01-30 · @Hesamation on X

Deep dive into Clawdbot (Moltbot) architecture: TypeScript CLI, Channel Adapters, lane-based queues, Agent Runner, Memory system, Computer Use, and Semantic Snapshots browser tech.

clawdbot architecture