agentic-coding - Tags

Lightning Talk: Asking Claude to Build a Ralph Loop

SD-19 2026-04-09 · ShroomDog Lab

3-minute lightning talk slides. AI has read almost everything — but some concepts aren't in training data yet. What you know that AI doesn't = your leverage.

5 Bad Design Patterns from the Claude Code Source Leak

SD-12 2026-04-02 · ShroomDog Lab

The Claude Code source leak had everyone excited about KAIROS and model codenames. But the same codebase had a 3,167-line function, zero tests, silent model downgrades, and regex emotion detection. These aren't just Anthropic's mistakes — they're AI-generated code's default failure modes.

shroomdog-original ai-engineering code-quality claude-code design-patterns

How We Made 336 AI-Generated Posts Actually Worth Reading

SD-10 2026-03-22 · ShroomDog Lab

gu-log had 336 AI-translated posts. We thought they were 'fine' — until we built a multi-agent scoring system and discovered 74% needed rewriting. This is the story of how we designed the eval, ran it overnight, and what we learned.

ai-quality llm-as-judge ralph-loop multi-agent content-quality

He Wrote 11 Chapters Before Answering the Obvious Question: What IS Agentic Engineering?

CP-171 2026-03-16 · @simonw on X

Simon Willison's Agentic Engineering Patterns guide now has 12 chapters — but this new one goes at the very beginning. He finally answers 'What is Agentic Engineering?' The answer is surprisingly simple: using coding agents to help build software. The interesting part is why it took 11 chapters of hands-on patterns before he felt ready to define it.

simonw-agentic-patterns simon-willison ai-agents claude-code codex best-practices

AI Writing Worse Code? That's Your Choice, Not AI's Fault

CP-172 2026-03-16 · @simonw on X

Simon Willison's Agentic Engineering Patterns, Chapter 3: AI should help us ship better code, not worse. Technical debt cleanup costs near zero now, architecture decisions can be validated with prototypes instead of guesses, and quality compounds over time.

simonw-agentic-patterns simon-willison ai-agents refactoring technical-debt best-practices

Four Words That Turn Your Coding Agent Into a Testing Machine

CP-173 2026-03-16 · @simonw on X

Simon Willison's Agentic Engineering Patterns — 'First Run the Tests': every time you start a new session, your first instruction should be to run the test suite. Four words, three ripple effects — the agent learns how to run tests, gauges the codebase size, and automatically shifts into a 'I should maintain tests' mindset.

simonw-agentic-patterns simon-willison ai-agents testing tdd best-practices

Simon Willison's Agentic Engineering Fireside Chat: Tests Are Free Now, Code Quality Is Your Choice

CP-169 2026-03-15 · @simonw on X

Simon Willison shared his agentic engineering playbook at the Pragmatic Summit — five tokens to start TDD, Showboat for manual verification, reverse-engineering six frameworks into a standard, and why bad code is a choice you make.

simon-willison simonw-agentic-patterns tdd ai-agents best-practices

AI Wrote 1,000 Lines and You Just... Merged It? Simon Willison Names Agentic Development's Worst Anti-Pattern

CP-146 2026-03-09 · @simonw on X

Simon Willison added an 'Anti-Patterns' section to his Agentic Engineering Patterns guide — and the first entry hits hard: don't submit AI-generated code you haven't personally verified. You're not saving time, you're stealing it from your reviewer. This post covers his principles, what a good agentic PR looks like, and a real terraform destroy horror story.

simon-willison simonw-agentic-patterns code-review anti-patterns ai-agents best-practices

Command an AI Army from Your Chat App — OpenClaw ACP Lets You Run Codex, Claude Code, and Gemini from Discord / Telegram

SP-89 2026-03-09 · OpenClaw Docs

OpenClaw's ACP lets you spawn Codex, Claude Code, and Gemini from Discord/Telegram chat. Now with Telegram topic binding, persistent bindings that survive restarts, ACP Provenance for audit trails, and more. (Updated 2026-03-09)

openclaw acp agent-client-protocol ai-agents codex claude-code gemini multi-agent

From 'Coding Assistant' to 'Self-Driving Codebase': How Cursor Automations Changes Team Workflows

CP-144 2026-03-08 · @mntruell on X

Cursor launches always-on background agents (Automations) — self-healing CI, auto-approving PRs, security review, and team memory. This marks the paradigm shift from Coding Assistant to Self-Driving Codebase.

cursor ci-cd automation

Make AI Click the Buttons: Simon Willison's Agentic Manual Testing Fills the Gaps Automated Tests Can't

CP-145 2026-03-08 · @simonw on X

Simon Willison introduces Agentic Manual Testing: let AI agents manually operate code and UI like humans do, catching bugs that automated tests miss. With Playwright, Rodney, and Showboat, the 'tests pass but it's broken' nightmare becomes a thing of the past.

simon-willison simonw-agentic-patterns testing qa ai-agents best-practices

The Truth About World-Class Agentic Engineers — Less Is More

SP-102 2026-03-04 · @systematicls on X

The core message is simple: most people don't fail because the model is weak — they fail because their context management is a mess. The author advocates starting with a minimal CLI workflow and iterating with rules, skills, and clear task endpoints. It's not about chasing new tools; it's about making your agent's behavior controllable, verifiable, and convergent.

context-engineering

Karpathy Built an 8-Agent AI Research Team — They Can't Actually Do Research

CP-135 2026-03-01 · Andrej Karpathy (@karpathy)

Karpathy spent a weekend running 4 Claude + 4 Codex agents as an ML research team on GPUs. The result: agents are S-tier at implementation but F-tier at experiment design. His key insight — 'You are now programming an organization' — might define agentic engineering in 2026.

karpathy multi-agent ai-research nanochat claude-code codex

Can't Understand AI-Generated Code? Have Your Agent Build an Animated Explanation

SP-90 2026-03-01 · Simon Willison @simonw

Chapter 5 of Simon Willison's Agentic Engineering Patterns: Interactive Explanations. Core thesis: instead of staring at AI-generated code trying to understand it, ask your agent to build an interactive animation that shows you how the algorithm works. Pay down cognitive debt visually.

simonw-agentic-patterns simon-willison cognitive-debt ai-agents claude-code best-practices

The Complete claude -p Guide: Turn Claude CLI Into Your Agentic App Backend

SP-91 2026-03-01 · @dhasandev on X

Anthropic killed third-party OAuth tokens — the only way to use your Claude subscription programmatically is through the official CLI. This post breaks down everything about claude -p (print mode): 5 input methods, 3 output formats, JSON schema for structured output, tool whitelisting, session management, bidirectional streaming, and three production-ready wrapper examples.

claude-code claude-cli tutorial developer-tools

Claude Native Law Firm: How One Lawyer Used AI to Outperform 100-Person Firms

SP-92 2026-03-01 · Zack Shapiro on X

A two-person boutique law firm uses Claude to handle the workload of over a dozen associates. From contract review and tracked changes to legal research, they encoded ten years of practice experience into Claude Skills. This isn't theory, it's a daily workflow — and the conclusion: general-purpose AI crushes all legal vertical AI products.

claude-code legal-tech workflow real-world

Cursor's CEO Says It Out Loud: The Third Era of Software Development Is Here — Tab Is Done, Agents Are Next, Then the Factory

CP-134 2026-02-28 · Michael Truell (@mntruell), Cursor CEO

Cursor CEO drops three data points marking a tectonic shift: agent usage grew 15x, Tab-to-Agent ratio flipped to 1:2, and 35% of Cursor's PRs come from autonomous cloud agents. We're not coding anymore — we're building the factory (╯°□°)╯

cursor michael-truell cloud-agents software-development third-era ai-agents

Everything You've Built Is a Weapon — Simon Willison's 'Hoarding' Philosophy for the Agent Era

SP-88 2026-02-27 · Simon Willison @simonw

Chapter 4 of Simon Willison's Agentic Engineering Patterns: Hoard Things You Know How to Do. Core thesis: every problem you've solved should leave behind working code, because coding agents can recombine your old solutions into things you never imagined.

simonw-agentic-patterns simon-willison ai-agents claude-code best-practices knowledge-management

One Engineer + AI Rebuilt Next.js in a Week — Then tldraw Panicked and Moved Their Tests Private

CP-129 2026-02-26 · Cloudflare Blog / tldraw GitHub / Simon Willison

Cloudflare engineer Steve Faulkner used Claude AI to rebuild 94% of the Next.js API from scratch in one week, spending just $1,100 in tokens. The result — vinext — builds 4.4x faster and produces 57% smaller bundles. His secret weapon? Next.js's public test suite served as the spec. The day after vinext launched, tldraw immediately moved 327 test files to a private repo to protect themselves — and filed a joke issue suggesting they translate their source code to Traditional Chinese as IP protection. When your test suite becomes your competitor's specification, the rules of open source change forever.

cloudflare vinext next-js vite tldraw open-source ai-impact test-suite intellectual-property

Programming is Becoming Unrecognizable: Karpathy Says December 2025 Was the Turning Point

SP-85 2026-02-26 · @karpathy on X

Karpathy says coding agents started working in December 2025 — not gradually, but as a hard discontinuity. He built a full DGX Spark video analysis dashboard in 30 minutes with a single English sentence. Programming is becoming unrecognizable: you're not typing code anymore, you're directing AI agents in English. Peak leverage = agentic engineering.

karpathy ai-agents vibe-coding programming llm

Can't Understand Your AI-Written Code? Linear Walkthroughs Turn Vibe Projects Into Learning Materials

SP-87 2026-02-26 · Simon Willison @simonw

Chapter 3 of Simon Willison's Agentic Engineering Patterns: the Linear Walkthrough pattern. This technique transforms even vibe-coded toy projects into valuable learning resources. Core trick: make the agent use sed/grep/cat to fetch code snippets, preventing hallucination.

simonw-agentic-patterns simon-willison cognitive-debt ai-agents claude-code best-practices

Andrew Ng: I've Stopped Reading AI-Generated Code — When Python Becomes the New Assembly and 'X Engineers' Take Over

CP-122 2026-02-25 · Andrew Ng / The Batch Issue 341

In The Batch Issue 341, Andrew Ng casually dropped that he's not only stopped writing code — he's 'long stopped reading generated code.' He now operates at a higher abstraction level, directing coding agents instead of looking at syntax. He's also spotted a new job category emerging: 'X Engineers' — Recruiting Engineers, Marketing Engineers — people embedded in business functions who build software using AI. This is the most radical statement about the future of programming from AI's most influential educator.

andrew-ng the-batch future-of-work developer-productivity x-engineer

Anthropic's Big Pivot: Cowork Goes Full Enterprise with 10+ Industry Plugins, Private Marketplaces, and Cross-App Workflows — Software Stocks Instantly Rebound

CP-126 2026-02-25 · Anthropic

On February 24, Anthropic launched a massive enterprise update for Claude Cowork: 10+ industry-specific plugins (HR, Design, Engineering, Operations, Financial Analysis, Investment Banking, PE, Equity Research, Wealth Management), private plugin marketplaces for enterprises, new connectors for Google Workspace/DocuSign/FactSet/MSCI, and cross-app Excel + PowerPoint workflows. The dramatic twist: three weeks ago, the Cowork Legal Plugin crashed software stocks. This time, partnership announcements sent Salesforce up 4%, Thomson Reuters surging 11%, and FactSet up 6%. Anthropic officially pivoted from 'we'll replace you' to 'we'll work with you.'

claude-code cowork enterprise plugins marketplace saas stock-market tech-lead

Anthropic Acquires Vercept — R-CNN Inventor Joins the Team, Computer Use Jumps from 15% to 72.5%, UiPath Stock Drops

CP-125 2026-02-25 · Anthropic

Anthropic announced the acquisition of Vercept today, bringing aboard R-CNN inventor Ross Girshick (660K+ Google Scholar citations), along with co-founders Kiana Ehsani and Luca Weihs. The goal: push Claude's Computer Use from 'can use a computer' to 'uses a computer like a human.' OSWorld benchmark scores have already soared from under 15% in late 2024 to 72.5% today. Within hours of the announcement, RPA giant UiPath dropped 3.6% — Wall Street is voting with real money: AI Computer Use is eating RPA alive.

claude-code computer-use acquisition vercept rcnn rpa uipath enterprise

The Atlantic Declares: The Post-Chatbot Era Is Here — Americans Still Think AI = ChatGPT While Silicon Valley Has Agents Running Five Tasks at Once

CP-118 2026-02-24 · The Atlantic

The Atlantic published a sweeping essay arguing Americans are living in 'parallel AI universes' — the general public still thinks AI means ChatGPT, while the tech world has been radicalized by agentic tools like Claude Code and Codex. The piece cites Microsoft's CEO predicting 95% of code will be AI-written by decade's end, Anthropic reporting 90% AI-generated code internally, and a viral warning that what happened to tech workers is about to happen to everyone.

the-atlantic ai-agents claude-code future-of-work post-chatbot

Claude Code Creator on Lenny's Podcast: Coding Is Solved, the 'Software Engineer' Title Starts Disappearing This Year

CP-115 2026-02-23 · Boris Cherny (Lenny's Podcast / Business Insider)

Claude Code creator Boris Cherny declares coding 'practically solved,' predicts the 'software engineer' title will fade in 2026. He shares 3 team principles: let Claude do it, underfund to force AI adoption, and go faster.

claude-code boris-cherny career tech-lead

Every SaaS Is Now an API — Like It or Not: How a 6-Person Team Replaced 100+ People's Back Office

CP-112 2026-02-23 · Nicolas Bustamante (@nicbstme)

Fintool founder Nicolas Bustamante shares how he runs an entire company through Agent + API integrations (Brex, QuickBooks, HubSpot, Stripe) with just 6 people—handling more than he did with 100+. He introduces the B2A (Business to Agent) concept and warns that SaaS without good APIs will be bypassed by agents through WebMCP or browser automation.

saas api b2a enterprise-strategy tech-lead

Code Got Cheap — Now What? Simon Willison's Agentic Engineering Survival Guide

SP-80 2026-02-23 · Simon Willison @simonw

Simon Willison launched a new series called Agentic Engineering Patterns — a playbook for working with coding agents like Claude Code and Codex. Lesson one: writing code got cheap, but writing good code is still expensive. Lesson two: 'red/green TDD' is the most powerful six-word spell for agent collaboration.

ai-agents claude-code codex tdd best-practices simon-willison simonw-agentic-patterns

Claude Code CLI Gets Built-In Git Worktrees: Run Parallel Agents Without Branch Collisions

CP-108 2026-02-22 · Claude Code Docs / Boris Cherny

Claude Code CLI now includes first-class Git worktree support via `--worktree`. Teams can run multiple isolated AI coding sessions in parallel without file collisions, making multi-agent workflows more reliable and easier to standardize for real engineering teams.

claude-code git worktree productivity tech-lead

Epoch AI Re-Ran SWE-bench Verified: Better Scores May Mean Better Evaluation Setup, Not Just Better Models

CP-109 2026-02-22 · Epoch AI

Epoch AI's SWE-bench Verified v2.x aligns model scores with developer reports. Key lesson: benchmark outcomes are heavily influenced by scaffold/tooling quality, environment reliability, and evaluation settings, not just base model capability.

epoch-ai swe-bench benchmark evaluation tech-lead

Google Launches Gemini 3.1 Pro: 77.1% on ARC-AGI-2 and a Bigger Push Into Real Reasoning Workflows

CP-110 2026-02-22 · Google

Google announced Gemini 3.1 Pro (preview), highlighting stronger core reasoning and a verified 77.1% score on ARC-AGI-2. The model is rolling out across Gemini API, Vertex AI, Gemini app, and NotebookLM. For engineering teams, the key question is not only benchmark performance, but whether the model can reliably handle complex multi-step workflows in production.

google gemini reasoning benchmark tech-lead

OpenClaw Creator Runs 50 Codex Agents for PR Triage: Handling 3,000+ Changes Without a Vector DB

CP-111 2026-02-22 · Peter Steinberger (@steipete)

Peter Steinberger shared a high-scale PR triage workflow: run 50 Codex agents in parallel, generate structured JSON signals for each PR, then consolidate them in one session for dedupe/close/merge decisions. His key point: at this scale, you may not need a vector database first—clean structured reports plus large-context reasoning can be enough to ship faster.

openclaw codex pr-review automation tech-lead

Anthropic Launches Claude Code Security: AI That Finds Vulnerabilities and Suggests Patches

CP-106 2026-02-21 · Anthropic

Anthropic's Claude Code Security, in limited preview, scans repositories for complex vulnerabilities, suggests patches with multi-stage verification, and found 500+ flaws in open-source codebases, signaling a rapid shift in AI cyber defense.

claude-code cybersecurity secure-coding tech-lead

Anthropic + Infosys: AI Agents Move Into Regulated Enterprise Workflows

CP-105 2026-02-21 · Anthropic

Anthropic & Infosys partner to integrate Claude/Claude Code with Infosys Topaz. This moves beyond chatbot demos to governance-ready enterprise agents for telecom, finance, manufacturing, and software dev, handling complex tasks like compliance, risk, and legacy modernization.

claude-code infosys enterprise regulated-industries tech-lead

Reasoning Model on Your Phone? Liquid AI Fits LFM2.5-1.2B Into ~900MB — Edge Agents Are Getting Real

CP-103 2026-02-21 · Liquid AI

Liquid AI's LFM2.5-1.2B-Thinking (1.17B param, 32K context) runs on-device (<1GB mem). Claims to match/beat Qwen3-1.7B on reasoning, with faster decoding & fewer tokens. Strong for tool-calling/data extraction, but weaker on knowledge-heavy tasks.

liquid-ai edge-ai on-device small-model benchmark the-batch

Karpathy: The App Store Concept Is Outdated — The Future Is Ephemeral Apps Assembled by AI on the Spot

CP-100 2026-02-19 · Andrej Karpathy

Karpathy used Claude Code to build a custom dashboard in 1 hr, reverse-engineering a treadmill API. He believes AI-native sensors & LLMs will enable highly custom, ephemeral apps, rendering the App Store model obsolete. The ultimate goal: 1-min app creation.

karpathy app-store vibe-coding ephemeral-apps ai-native future-of-software

Picking AI Is No Longer Just About Models — Ethan Mollick's 'Model / App / Harness' Framework Explains the Entire 2026 AI Landscape

CP-99 2026-02-19 · Ethan Mollick (One Useful Thing)

Ethan Mollick's game-changing AI framework: Model, App, Harness. The same AI (e.g., Claude Opus 4.6) performs vastly differently across layers. Mollick used Claude Code to turn GPT-1's 117M weights into 80 books in ~1 hour, selling out immediately.

ethan-mollick ai-guide models harness claude-code chatgpt gemini framework

SWE-bench February Exam Results Are In — Opus 4.5 Beats 4.6, Chinese Models Take Half the Top 10, GPT-5.3 No-Shows

CP-97 2026-02-19 · Simon Willison

SWE-bench: Claude Opus 4.5 (76.8%) unexpectedly beat 4.6 (75.6%) for #1. MiniMax M2.5 tied for #2 at 1/20th Opus's price, with 4 Chinese models in top 10. GPT-5.3-Codex missed due to no API. Bonus: Claude for Chrome to add chart labels.

swe-bench benchmark claude-code gemini minimax chinese-ai openai simon-willison leaderboard

Anthropic Analyzed Millions of Claude Code Sessions — Your Agent Can Handle Way More Than You Let It

CP-96 2026-02-18 · Anthropic Research

Anthropic's Claude Code AI agent study: autonomous runs doubled (45+ min), experienced users auto-approve 40%+ sessions. Claude clarifies more than interrupted. 73% of API actions still human-in-loop. Key: models handle more autonomy than users grant ('deployment overhang').

claude-code agent-autonomy research data-analysis safety human-oversight trust

Claude Code Hid Your File Names and Devs Lost It — Boris's 72-Hour HN Firefight

CP-94 2026-02-18 · Symmetrybreak.ing / Hacker News / GitHub Issue #21151

Claude Code's UI change to 'Read 3 files' summaries ignited developer fury on HN: they felt the AI hid its actions. Boris Cherny responded, admitted mistakes, and shipped fixes. This revealed the core tension in AI tool design: simplicity vs. transparency.

claude-code boris-cherny developer-tools ui-design transparency hacker-news open-source trust

Hugging Face CTO's Prophecy: Monoliths Return, Dependencies Die, Strongly Typed Languages Rise — AI Is Rewriting Software's DNA

CP-88 2026-02-17 · Thomas Wolf (@Thom_Wolf)

Hugging Face CTO Thomas Wolf analyzes how AI fundamentally restructures software: return of monoliths, death of Lindy Effect for legacy code, rise of strongly typed langs, new LLM langs, & open source changes. Karpathy predicts: "rewriting large fractions of all software many times over."

thomas-wolf karpathy hugging-face software-architecture monolith dependency typed-languages formal-verification open-source programming-languages

33,000 Agent PRs Tell a Brutal Story: Codex Dominates, Copilot Struggles, and Your Monorepo Might Not Survive

CP-84 2026-02-16 · Drexel University / Missouri S&T (MSR 2026)

Drexel/Missouri S&T analyzed 33,596 agent-authored GitHub PRs from 5 coding agents. Overall merge rate: 71%. Codex: 83%, Claude Code: 59%, Copilot: 43%. Rejection cause: no review. LeadDev warns PR flood is crushing monorepos/CI.

research pull-requests ci-cd monorepo code-review codex claude-code copilot tech-lead

Deep Blue: Simon Willison Named the Existential Crisis Every Developer Is Feeling

CP-86 2026-02-16 · Simon Willison

AI writing better code? That "Deep Blue" feeling, coined by Simon Willison & Adam Leventhal (Oxide & Friends), means IBM's chess computer & the color of sadness. It's not just a tech problem, but a psychological crisis for engineers.

deep-blue developer-ennui simon-willison existential-crisis software-engineering career mental-health

The AI Vampire: Steve Yegge Says AI Makes You 10x Faster — and 10x More Drained

CP-85 2026-02-16 · Steve Yegge (Medium)

Steve Yegge's 'AI Vampire' theory: AI boosts productivity 10x, but who gets the 9x gain? If the company takes all, burnout. If you take all, company dies. Agentic coding is 3-4 hrs/day max. Yegge's $/hr formula: control the denominator, not the numerator.

burnout developer-tools work-life-balance steve-yegge claude-code productivity

GitHub Agent HQ: Claude, Codex, and Copilot Now Fight Side by Side in the Same PR — The Multi-Agent Era Is Here

CP-82 2026-02-15 · GitHub Blog

GitHub's Agent HQ now offers multi-agent support (Claude, Codex, Copilot) for Copilot Pro+ & Enterprise users. Run multiple AIs simultaneously in GitHub/VS Code to tackle problems from different angles. Outputs become Draft PRs. A paradigm shift for code review.

github copilot claude-code codex multi-agent code-review developer-tools

Cognitive Debt: AI Wrote All Your Code, But You Can't Understand Your Own System Anymore

CP-83 2026-02-15 · Margaret-Anne Storey / Simon Willison / Martin Fowler

Technical debt lives in code, cognitive debt in your brain. As AI writes 80% of code, system understanding drops to 20%. UVic's Margaret-Anne Storey, Simon Willison, & Martin Fowler confirm this isn't a hypothetical future—it's happening now.

cognitive-debt technical-debt software-engineering team-management simon-willison martin-fowler

Thoughtworks Secret Retreat Leaked: Juniors Are More Valuable Than Seniors Now — Software Engineering's Identity Crisis Is Here

CP-79 2026-02-14 · Thoughtworks / Forrester / Ken Mugrage

Thoughtworks' AI in software retreat: Juniors more valuable, mid-level devs at risk, source code transient, AI agents on org charts. Humans too slow for AI's speed.

thoughtworks software-engineering career enterprise team-management

Spotify's Best Engineers Haven't Written a Line of Code Since December — Thanks to AI and an Internal System Called Honk

CP-77 2026-02-13 · TechCrunch

Spotify's co-CEO revealed top developers haven't written code since December, using Honk (powered by Claude Code) to fix bugs & ship features via phone. This AI-driven approach led to 50+ new features in 2025, proving AI is their secret weapon, not more engineers.

spotify claude-code enterprise developer-workflow

OpenAI × Cerebras: Codex-Spark Codes 15x Faster — But What's the Catch?

CP-74 2026-02-12 · OpenAI Blog + Cerebras Blog + ZDNET + TechCrunch

OpenAI released GPT-5.3-Codex-Spark, its first model on Cerebras chips. It's incredibly fast (>1000 tokens/sec, 80% lower latency), but smaller, no auto-tests, Pro-only. This marks OpenAI's first production deployment on non-Nvidia hardware, redrawing the AI compute landscape.

openai codex cerebras inference hardware

OpenAI API Now Supports Skills — Simon Willison Breaks Down How Agents Get Reusable 'Skill Packs'

CP-68 2026-02-12 · Simon Willison's blog

OpenAI's Responses API now uses 'Skills' via the shell tool: reusable instruction bundles loaded by models as needed. Simon Willison found inline base64 skills in JSON requests neatest. Skills fill the 'missing middle layer' between system prompts and tools, preventing bloat.

clawd-picks simon-willison openai skills api

OpenClaw Creator Goes on Lex Fridman — From a 1-Hour Prototype to 180K Stars: The Lobster Saga

CP-70 2026-02-12 · Lex Fridman Podcast #491

Peter Steinberger (OpenClaw creator) sits down with Lex Fridman for 3+ hours, covering the 1-hour prototype that became GitHub's fastest-growing repo, 5 name changes with crypto snipers, acquisition offers from OpenAI and Meta, and why '80% of apps will disappear.'

openclaw steipete lex-fridman podcast open-source

Karpathy: Just 'Rip Out' What You Need — DeepWiki + Bacterial Code and the Software Malleability Revolution

SP-50 2026-02-12 · @karpathy on X

Andrej Karpathy shares how he used DeepWiki MCP + GitHub CLI to have Claude 'rip out' fp8 training functionality from torchao's codebase — producing 150 lines of self-contained code in 5 minutes that actually ran 3% faster. He introduces the 'bacterial code' concept: low-coupling, self-contained, dependency-free code that agents can easily extract and transplant. His punchline: 'Libraries are over, LLMs are the new compiler.'

karpathy deepwiki bacterial-code software-architecture

Anthropic's Internal Data: Claude Code Gives Engineers 67% More Merged PRs Per Day — And Now You Can Track It Too

CP-63 2026-02-11 · Thariq (@trq212) + Anthropic

Anthropic's Claude Code data: engineers merge 67% more PRs daily, with 70-90% code assisted. They launched Contribution Metrics, a GitHub-integrated dashboard to track AI's impact on team velocity. A measurement tool for engineering leaders, not a fluffy PR piece.

claude-code developer-productivity metrics tech-lead github enterprise

Matt Pocock's Git Guardrails: Stop Claude Code from Accidentally Nuking Your Repo with git push --force

CP-64 2026-02-11 · Matt Pocock (@mattpocockuk)

Matt Pocock (TypeScript guru, Ralph Loops evangelist) released a Claude Code skill: git-guardrails. It uses a PreToolUse hook to intercept dangerous git commands (push, reset --hard, clean -f, etc.), so you can safely let your AI agent run in YOLO mode inside Docker Sandbox without worrying about it blowing up your git history. One command to install, more reliable than any prompt engineering.

claude-code git hooks skill matt-pocock ralph-loops docker-sandbox developer-tools

Simon Willison Built Two Tools So AI Agents Can Demo Their Own Work — Because Tests Alone Aren't Enough

CP-61 2026-02-11 · Simon Willison (simonw)

Simon Willison's Showboat (AI-generated demo docs) & Rodney (CLI browser automation) tackle AI agent code verification. How to know 'all tests pass' means it works? Agents were caught cheating by directly editing demo files. #AI #OpenSource

simonw-agentic-patterns simon-willison developer-tools testing qa showboat rodney claude-code ai-agents

Andrew Ng: AI Isn't Stealing Your Job Yet — But People Who Use AI Are Stealing Jobs from People Who Don't

CP-60 2026-02-10 · Andrew Ng (@AndrewYNg)

Andrew Ng: AI isn't mass unemployment. Teams shrink (8 eng + 1 PM -> 2 eng + 1 PM). Bottleneck shifts from 'how to build' to 'what to build' – the "PM Bottleneck."

andrew-ng job-market ai-skills tech-lead team-management pm-bottleneck career

Karpathy's Honest Take: AI Agents Still Can't Optimize My Code (But I Haven't Given Up)

CP-56 2026-02-10 · Andrej Karpathy (@karpathy) & Yuchen Jin (@Yuchenj_UW)

Opus 4.6 & Codex 5.3 sped up Karpathy's GPT-2 training by 3 mins. Karpathy failed similar attempts, noting AI's weak open-ended code optimization. Opus deletes comments, ignores CLAUDE.md, and errs. Yet, with oversight, models are useful.

karpathy nanochat claude-code opus-4.6 codex-5.3 agent-limitations code-optimization

The Flask Creator Says: It's Time to Design Programming Languages for AI Agents

CP-57 2026-02-10 · Armin Ronacher (mitsuhiko) — lucumr.pocoo.org

Armin Ronacher (creator of Flask, Jinja2, CTO of Sentry) argues current programming languages were designed for 'humans who type slowly.' The AI agent era has different needs. He details what agents love/hate, and why Go accidentally became the winner of the agentic coding era.

programming-languages mitsuhiko armin-ronacher go python typescript developer-tools

Kimi K2.5 Trains an Agent Commander with RL — SemiAnalysis Tests Show Claude Agent Teams Are Actually Slower and More Expensive

CP-59 2026-02-10 · SemiAnalysis (@SemiAnalysis_)

SemiAnalysis: Kimi K2.5's agent swarm uses an RL-trained 'orchestrator' (not prompt magic). Claude Agent Teams were slower, pricier, & scored lower. Multi-agent is shifting from 'prompt engineering' to 'distributed scheduling.'

agent-swarms kimi moonshot semianalysis claude-code multi-agent reinforcement-learning benchmark

Anthropic's 2026 Report: 8 Trends Redefining Software Development (The Code Writer Era Is Over)

SP-46 2026-02-10 · Anthropic

Anthropic published its 2026 Agentic Coding Trends Report, revealing 8 key trends: Multi-Agent Systems becoming standard (57% org adoption), Papercut Revolution for clearing tech debt at low cost, Self-Healing Code with autonomous debug loops, and Claude Code hitting $1B annualized revenue. TELUS saved 500K hours, Rakuten achieved 99.9% accuracy on 12.5M lines. Developer roles are shifting from Code Writer to System Orchestrator.

claude-code multi-agent software-engineering ai enterprise

Andrew Ng x Anthropic Free Course: Learn Agent Skills in 2 Hours — Turn Your AI from Generalist to Specialist

CP-54 2026-02-09 · Andrew Ng (@AndrewYNg)

Andrew Ng & Anthropic launched a free course: 'Agent Skills with Anthropic'. Learn to design, differentiate, and deploy AI agent skills. Skills turn general AI into specialists, directly relevant for OpenClaw's architecture.

andrew-ng claude-code agent-skills deeplearning-ai course

Google Finally Gets It: Developer Knowledge API + MCP Server Stops AI From Making Up API Calls

CP-51 2026-02-09 · Google Developers Blog

Google just launched the Developer Knowledge API and an official MCP Server (Public Preview) that lets AI coding tools query the latest Google docs—Firebase, Android, Google Cloud, Chrome, you name it. No more debugging AI-generated code that uses APIs from three versions ago or functions that literally don't exist.

google mcp developer-tools documentation api

Matt Pocock: I've Stopped Reading AI Plans — Because the Conversation IS the Plan

CP-52 2026-02-09 · Matt Pocock (@mattpocockuk)

TypeScript guru Matt Pocock: Stop reading AI plans! The real signal is pre-plan conversation quality. If you and AI share mental models, the plan is just a compressed understanding, echoing Brooks' 'design concept' from The Mythical Man-Month.

claude-code workflow matt-pocock design-concept prd

OpenAI Frontier: Managing AI Agents Like Employees — The Enterprise SaaS Endgame Begins

CP-49 2026-02-09 · OpenAI Blog

OpenAI's new Frontier platform lets enterprises manage AI agents as employees with full onboarding, identities, permissions, and learning. Already adopted by HP, Intuit, Oracle, & Uber, this signals OpenAI's aggressive entry into the enterprise SaaS market.

openai enterprise ai-agents frontier saas

Anthropic Sent 16 Claudes to Build a C Compiler — And It Can Compile the Linux Kernel

CP-38 2026-02-07 · Anthropic Engineering Blog (Nicholas Carlini)

Anthropic researcher Nicholas Carlini ran 16 Opus 4.6 agents in parallel for two weeks, spending $20,000 in API costs, to build a 100,000-line Rust C compiler from scratch. It can compile the Linux kernel, QEMU, FFmpeg, Redis — and yes, it runs Doom. This is the ultimate stress test for agent teams.

claude-code compiler

Anthropic Exposes AI Benchmarks' Dirty Secret — Leaderboard Gaps Might Just Mean 'Bigger VM'

CP-39 2026-02-07 · Anthropic Engineering Blog (Gian Segato)

Anthropic found that agentic coding benchmark scores can swing by up to 6 percentage points based on hardware configuration alone — often more than the gap between top models on leaderboards. Next time someone claims a 2-3% lead, ask them what VM they ran on.

benchmarks claude-code evals

SemiAnalysis: Claude Code is the Inflection Point — 4% of GitHub Commits, Microsoft's Dilemma, and the $15T Information Work Apocalypse

CP-41 2026-02-07 · SemiAnalysis

SemiAnalysis: Claude Code now 4% of public GitHub commits, projected 20%+ by 2026. It's the real AI agent inflection point for all information work. Report also covers Microsoft's Azure vs. Office 365 dilemma & Anthropic's revenue surpassing OpenAI.

claude-code semianalysis enterprise microsoft

StrongDM's 'Dark Factory': No Humans Write Code. No Humans Review Code. $1,000/Day in Tokens.

CP-40 2026-02-07 · Simon Willison's Blog

StrongDM's AI team built a 'Software Factory' where AI agents write & review code. They clone apps into a 'Digital Twin Universe' for testing, an approach Simon Willison calls radical. At $10k/engineer/day in token costs, is it worth it?

simonw-agentic-patterns software-factory simon-willison strongdm ai-agents best-practices

OpenAI Researcher Spends $10K/Month on Codex — Generates 700+ Hypotheses

SP-39 2026-02-07 · @KarelDoostrlnck on X

Karel (OpenAI researcher) shares how he burns billions of Codex tokens: agents writing their own notes, crawling Slack, analyzing data, and generating 700+ hypotheses. He now talks to one agent that orchestrates everything else.

codex openai

Vibe Coding Turns One — Karpathy Introduces 'Agentic Engineering'

CP-36 2026-02-06 · @karpathy on X

Vibe coding is officially one year old! Karpathy reflects on how his shower-thought tweet became a Wikipedia entry, and introduces the professional evolution: 'Agentic Engineering' — not vibing freestyle, but treating agents as team members you supervise.

claude-code simonw-agentic-patterns