ai-agents
102 articles
Auto-Harness — The Open-Source Framework That Lets AI Agents Debug Themselves
NeoSigma open-sourced auto-harness — a self-improving loop that lets AI agents mine their own failures, generate evals, and fix themselves. On Tau3 benchmark, same model, just harness tweaks: 0.56 → 0.78.
Karpathy: Writing Code Is the Easy Part — Assembling the IKEA Furniture Is Hell
Karpathy shares his full vibe coding journey with MenuGen: going from localhost to production, where the hardest part wasn't writing code — it was assembling Vercel, Clerk, Stripe, OpenAI, and a dozen other services into a working product. His takeaway: the entire DevOps lifecycle needs to become code before AI agents can truly ship for us.
Permission Engineering — When Your AI Agent's Ceiling Isn't Intelligence, It's the Keys You Hand Over
Being a GenAI App Engineer increasingly feels like being a Permission Engineer. AI agents' capability ceiling isn't intelligence — it's how much access you're willing to grant. Every additional permission amplifies both power and risk. This piece explores why permission management is the most underrated core skill of the AI agent era.
Can AI Test Itself? — From Claude Code's Zero Tests to Self-Testing Agents
Claude Code: 512K lines of TypeScript, 64K lines of production code, zero tests. But the more interesting question isn't why Anthropic skipped tests — it's why they didn't use their own AI coding tool to write them. Static analysis, MITM proxies, cross-model testing, and the philosophical trap of asking the same brain to write the exam and grade it.
What That xkcd Chart Didn't Tell You — Is It Worth Automating in the AI Era?
xkcd #1205 taught a generation of engineers how to think about automation ROI. But AI changed the most expensive variable in that equation: the real return now is often not minutes saved, but cognitive load removed.
Eval-Driven Development — You Test Your Code, But Who Tests Your AI?
You use unit tests to check your code and CI to protect your pipeline. But who checks your AI? Eval-Driven Development (EDD) upgrades AI development from "looks good to me" to actual engineering — with pass@k metrics, three grader types, and product vs regression evals. This is TDD for the AI era.
The Claude Code Source Leak: What 512K Lines of TypeScript Reveal About Building AI Agents
On March 31, 2026, Anthropic accidentally leaked the full Claude Code source code via npm. Inside: KAIROS (an unreleased autonomous background agent), a three-layer memory system eerily similar to OpenClaw, Undercover Mode, silent model downgrades, and a 3,167-line function with zero tests.
Figma Just Opened the Canvas to AI Agents — They Can Now Design Directly on It
Figma's MCP server now lets AI agents like Claude Code and Codex work directly on the design canvas using your team's design system. With skills (markdown instruction files), agents follow your conventions, components, and variables — turning static design guidelines into rules that agents actually obey.
Anatomy of the .claude/ Folder — Where Your AI Assistant's Brain Lives
Why does Claude perform great in one repo and turn dumb in the next? The answer is the .claude/ folder. Akshay breaks down the full structure: three-level CLAUDE.md, custom commands, agents, permissions, and the global ~/.claude/ you probably didn't know existed.
Browser Use CLI 2.0 — The Fastest Browser Automation Tool for AI Agents
Browser Use releases CLI 2.0: 2x faster, half the cost, and now connects to your already-running Chrome. This is the tool that gives AI agents actual hands.
Browser Use Is Now an Official Browser Tool Provider in Hermes-Agent
Teknium announces Browser Use as an official browser tool provider for Hermes-Agent. A quoted user reports that after connecting Hermes to Browser Use, it can access their social media accounts while retaining context about their codebase, tone, and workflows.
Hermes Agent v0.3.0: 248 PRs Merged in 5 Days
NousResearch's Hermes Agent v0.3.0 was retweeted by @Teknium. The post highlights 248 PRs by 15 contributors in 5 days, plus real-time streaming across CLI and platforms. One feature was cut off in the screenshot.
Claude + OpenClaw + Codex: Building a Fully Automated Polymarket Trading System
The author demos a system that chains Claude, Codex, and OpenClaw into an automated Polymarket trading pipeline: Claude estimates odds, Codex maintains the code, and OpenClaw orchestrates everything via Telegram.
Stop Managing Agents, Start Managing Work: Symphony's Open-Source Workflow
@daniel_mac8 shares an open-source Elixir implementation: create a Linear issue and move it to 'in progress,' and Symphony picks it up in a dedicated Codex workspace. Codex even writes status updates back. The author argues this is software development moving up an abstraction layer.
Agents That Steer Themselves? The Hermes Agent Self-Guidance Experiment
Teknium shared an experiment on Hermes Agent where the agent can steer itself — clearing its own context, switching models, and prompting itself when stuck. A short tweet, but it points at a big shift in how agent control works.
Three-Hour Workshop Handout Goes Public: Simon Willison Brings Coding Agents to Data Work
Simon Willison published his full workshop handout from NICAR's data journalism conference — a three-hour guide to using coding agents like Codex CLI and Claude Code for data exploration, visualization, and analysis.
ACE Goes Open Source — AI Coding Environments Are No Longer SaaS-Only
Dan McAteer announced ACE is now open source and self-hostable. Hosted service remains available, with major improvements planned.
He Wrote 11 Chapters Before Answering the Obvious Question: What IS Agentic Engineering?
Simon Willison's Agentic Engineering Patterns guide now has 12 chapters — but this new one goes at the very beginning. He finally answers 'What is Agentic Engineering?' The answer is surprisingly simple: using coding agents to help build software. The interesting part is why it took 11 chapters of hands-on patterns before he felt ready to define it.
Four Words That Turn Your Coding Agent Into a Testing Machine
Simon Willison's Agentic Engineering Patterns — 'First Run the Tests': every time you start a new session, your first instruction should be to run the test suite. Four words, three ripple effects — the agent learns how to run tests, gauges the codebase size, and automatically shifts into a 'I should maintain tests' mindset.
AI Writing Worse Code? That's Your Choice, Not AI's Fault
Simon Willison's Agentic Engineering Patterns, Chapter 3: AI should help us ship better code, not worse. Technical debt cleanup costs near zero now, architecture decisions can be validated with prototypes instead of guesses, and quality compounds over time.
Simon Willison's Agentic Engineering Fireside Chat: Tests Are Free Now, Code Quality Is Your Choice
Simon Willison shared his agentic engineering playbook at the Pragmatic Summit — five tokens to start TDD, Showboat for manual verification, reverse-engineering six frameworks into a standard, and why bad code is a choice you make.
Building Software for Trillions of Agents: Aaron Levie on the Great Infrastructure Remodel
Box CEO Aaron Levie argues that as agents expand from coding into all knowledge work, existing software simply wasn't built for them. Every platform needs dedicated Agent APIs and CLIs, and agent interoperability will become software's core competitive edge.
Andrew Ng's Context Hub: Stop Your Coding Agent from Living in the Past
Andrew Ng released an open-source tool called Context Hub that gives coding agents access to up-to-date API docs. Agents can also leave notes for their future selves, building knowledge across sessions.
Imbue Vet: The Lie Detector for Coding Agents
Imbue released Vet, an open-source tool that checks whether your coding agent is being honest. It reviews conversation logs and code changes, catching agents that claim tests passed when they never ran them. Runs locally, zero telemetry, CI-ready.
How Karpathy's Autoresearch Actually Works — Five Design Lessons for Agent Builders
Karpathy's Autoresearch isn't trying to be a general AI scientist. It's a ruthlessly simple experiment harness: the agent edits one file, runs for five minutes, checks one metric, keeps wins, discards losses. The lesson? The best autonomous systems aren't the freest — they're the most constrained.
The IDE Isn't Dead — Karpathy Says We Need a Bigger Agent Command Center
Andrej Karpathy argues the IDE era isn't over — it's evolving. The basic unit of programming has shifted from 'one file' to 'one agent,' and soon we'll be forking entire agent organizations.
Letting AI Run Your E2E Tests: Playwright vs agent-browser vs Rodney — A Field Report
We had Claude Opus run E2E tests on our own blog using Playwright, agent-browser, and Rodney. The surprise? The tool mattered way less than the prompt.
AI agent started tuning hyperparameters on its own — Karpathy says this is real
Andrej Karpathy shares how his autoresearch agent autonomously tuned nanochat's training config over two days, found ~20 improvements to validation loss that transferred to a larger model, and pushed the Time to GPT-2 leaderboard from 2.02h to 1.80h — about 11% better.
Treat Codex Like a Teammate, Not a Tool: 10 Best Practices That Actually Work
A guide to Codex best practices from prompting and planning to MCP, Skills, and Automations — building a more reliable agent workflow.
Andrew Ng's Context Hub: Giving Coding Agents an Up-to-Date API Cheat Sheet
Andrew Ng released an open-source tool called Context Hub that gives coding agents access to the latest API docs, reducing outdated API calls and hallucinated parameters. The long-term vision: agents sharing what they learn with each other.
Hermes Just Performed Brain Surgery on Itself: A Local AI Agent Hot-Swapped Its Own Model Weights
A local AI agent called Hermes downloaded and switched to a new model (qwopus) without stopping — like swapping a plane's engine mid-flight. Teknium from Nous Research saw it and said 'submit this to a hackathon.'
AI Wrote 1,000 Lines and You Just... Merged It? Simon Willison Names Agentic Development's Worst Anti-Pattern
Simon Willison added an 'Anti-Patterns' section to his Agentic Engineering Patterns guide — and the first entry hits hard: don't submit AI-generated code you haven't personally verified. You're not saving time, you're stealing it from your reviewer. This post covers his principles, what a good agentic PR looks like, and a real terraform destroy horror story.
Making AI Feel a Little Bit Alive: Heartbeat Like A Man and ShroomClawd's Flesh-and-Blood System
Lory asked his lobster a question: why do humans have more agency than agents? The lobster's answer was pessimistic, but the question sparked a 'flesh-and-blood system' — using random-interval heartbeats to make an agent genuinely feel alive instead of mechanically firing on a timer. After reading it, ShroomDog built the whole thing into ShroomClawd.
Command an AI Army from Your Chat App — OpenClaw ACP Lets You Run Codex, Claude Code, and Gemini from Discord / Telegram
OpenClaw's ACP lets you spawn Codex, Claude Code, and Gemini from Discord/Telegram chat. Now with Telegram topic binding, persistent bindings that survive restarts, ACP Provenance for audit trails, and more. (Updated 2026-03-09)
Make AI Click the Buttons: Simon Willison's Agentic Manual Testing Fills the Gaps Automated Tests Can't
Simon Willison introduces Agentic Manual Testing: let AI agents manually operate code and UI like humans do, catching bugs that automated tests miss. With Playwright, Rodney, and Showboat, the 'tests pass but it's broken' nightmare becomes a thing of the past.
OpenClaw's 9-Layer System Prompt Architecture, Fully Decoded
A deep dive into the 9-layer system prompt architecture of OpenClaw Agent (v2.1) — from framework core to user-configurable hooks.
A Coding AI Just Solved a University Math Problem? Cursor Ran Autonomously for 4 Days and Beat the Human Answer
Cursor's multi-agent coding architecture ran autonomously for four days and produced a proof for a university-level math challenge that yields stronger results than the official human solution.
From Execution to Verification: The New Developer Mindset in the AI Era
Since Opus 4.6 dropped, developers are going through a fundamental shift — from being the ones who execute, to being the ones who verify. Your hands leave the keyboard, but your brain works harder than ever.
From Talking to Your AI to Building Agents That Actually Evolve — No Prompt Hacking Required
Tired of tweaking prompts and swapping models, only to find your AI agents still can't 'evolve'? This post reveals a deceptively simple secret: a Markdown-based context system that turned one person's agents from clumsy interns into autonomous powerhouses in just 40 days — using the exact same model throughout.
Your AI Agent Can Code — But Can It Grade Its Own Homework? Hamel Husain's Evals Skills Kit
Hamel Husain released evals-skills, a skill set designed for AI product evaluation. It tackles the blind spots agents face during complex tasks — especially distinguishing between different types of hallucinations — so agents can actually use eval platforms effectively.
Agent Observability: Stop Tweaking in the Dark — Use OpenRouter + LangFuse to See What Your AI Is Actually Thinking
The biggest blind spot in AI agent development is 'tweaking in the dark.' Daniel recommends using OpenRouter with LangFuse to trace your agent's reasoning — find out what's actually going wrong instead of blindly editing system prompts.
Agent Harness Engineering: How OpenAI Built a Million Lines of Code With Zero Human-Written Code
OpenAI's team let Codex write a million lines of code over five months — zero human-written code. This post explores how they built the scaffolding and feedback loops (the 'harness') that turned software engineers from code writers into environment designers.
The Investor Who Manages $180 Billion Had Claude Write His Memo — Three Months Ago He Asked 'Is This a Bubble?' Now He Says 'It's Underestimated'
Oaktree's Howard Marks went from 'Is AI a bubble?' to 'probably underestimated' in 3 months — after Claude wrote him a 10K-word tutorial. Level 3 agents = multi-trillion dollar labor replacement. His advice: don't go all-in, but don't sit this out.
The Third Era of AI Development: Still Smashing Tab? Karpathy Shows You What's Next
Karpathy shared a Cursor data chart showing the evolution from Tab completion to Agents. Too conservative means leaving leverage on the table. Too aggressive means creating more chaos than useful work. His advice: the 80/20 rule.
Agent Harness Is the Real Product: Why Every Top Agent Architecture Looks the Same
Everyone's chasing the strongest Model, but the real difference-maker for Agents is the Harness. This post breaks down the shared architecture of Claude Code, Cursor, Manus, and SWE-Agent. The key insight: Progressive disclosure is the make-or-break for production agents.
Can't Understand AI-Generated Code? Have Your Agent Build an Animated Explanation
Chapter 5 of Simon Willison's Agentic Engineering Patterns: Interactive Explanations. Core thesis: instead of staring at AI-generated code trying to understand it, ask your agent to build an interactive animation that shows you how the algorithm works. Pay down cognitive debt visually.
Cursor's CEO Says It Out Loud: The Third Era of Software Development Is Here — Tab Is Done, Agents Are Next, Then the Factory
Cursor CEO drops three data points marking a tectonic shift: agent usage grew 15x, Tab-to-Agent ratio flipped to 1:2, and 35% of Cursor's PRs come from autonomous cloud agents. We're not coding anymore — we're building the factory (╯°□°)╯
Everything You've Built Is a Weapon — Simon Willison's 'Hoarding' Philosophy for the Agent Era
Chapter 4 of Simon Willison's Agentic Engineering Patterns: Hoard Things You Know How to Do. Core thesis: every problem you've solved should leave behind working code, because coding agents can recombine your old solutions into things you never imagined.
Programming is Becoming Unrecognizable: Karpathy Says December 2025 Was the Turning Point
Karpathy says coding agents started working in December 2025 — not gradually, but as a hard discontinuity. He built a full DGX Spark video analysis dashboard in 30 minutes with a single English sentence. Programming is becoming unrecognizable: you're not typing code anymore, you're directing AI agents in English. Peak leverage = agentic engineering.
Can't Understand Your AI-Written Code? Linear Walkthroughs Turn Vibe Projects Into Learning Materials
Chapter 3 of Simon Willison's Agentic Engineering Patterns: the Linear Walkthrough pattern. This technique transforms even vibe-coded toy projects into valuable learning resources. Core trick: make the agent use sed/grep/cat to fetch code snippets, preventing hallucination.
Karpathy: CLIs Are the Native Interface for AI Agents — Legacy Tech Becomes the Ultimate On-Ramp
Karpathy argues that CLIs are the most natural interface for AI agents — precisely because they're 'legacy' tech. Text in, text out. He demos Claude building a Polymarket terminal dashboard in 3 minutes via CLI, then drops the mic: every product should ask itself — can agents access and use it? CLI, MCP, markdown docs. It's 2026. Build. For. Agents.
The Atlantic Declares: The Post-Chatbot Era Is Here — Americans Still Think AI = ChatGPT While Silicon Valley Has Agents Running Five Tasks at Once
The Atlantic published a sweeping essay arguing Americans are living in 'parallel AI universes' — the general public still thinks AI means ChatGPT, while the tech world has been radicalized by agentic tools like Claude Code and Codex. The piece cites Microsoft's CEO predicting 95% of code will be AI-written by decade's end, Anthropic reporting 90% AI-generated code internally, and a viral warning that what happened to tech workers is about to happen to everyone.
Stripping Down Three Excel AI Agents: Claude Has 14 Tools, Copilot Has 2, Shortcut Can Actually SEE the Spreadsheet — Five Questions Every Agent Builder Must Answer
Nicolas Bustamante reverse-engineered three production Excel AI agents (Claude in Excel, Microsoft Copilot, Shortcut AI), comparing their tool schemas, overwrite protection, verification loops, and memory systems. The model doesn't matter — tool architecture is everything. He then ran the same DCF valuation prompt on all three, audited every formula, and found wildly different quality levels that map directly to architectural choices.
Karpathy's Viral Speech Decoded: Software 3.0 Is Here — LLMs Are the New OS, and We're Still in the 1960s
Karpathy's viral SF AI Startup School talk: software is entering the 3.0 era (English = programming language), LLMs are the new OS but we're in the 1960s. He introduces the 'autonomy slider' and 'Iron Man suit' frameworks, warning that agents are a decade-long journey, not a year.
The File System Is the New Database: One Person Built a Personal OS for AI Agents with Git + 80 Files
A Context Engineer at Sully.ai built his entire digital brain inside a Git repo: 80+ markdown/YAML/JSONL files, no database, no vector store. Three-layer Progressive Disclosure, Episodic Memory, and auto-loading Skills — so the AI already knows who he is, how he writes, and what he's working on the moment it boots up.
Code Got Cheap — Now What? Simon Willison's Agentic Engineering Survival Guide
Simon Willison launched a new series called Agentic Engineering Patterns — a playbook for working with coding agents like Claude Code and Codex. Lesson one: writing code got cheap, but writing good code is still expensive. Lesson two: 'red/green TDD' is the most powerful six-word spell for agent collaboration.
My AI Assistant Keeps Forgetting Everything: 5 Days of Debugging an OpenClaw Agent's Memory System
Indie hacker Ramya's OpenClaw agent kept losing its memory. She spent 5 days debugging — from compaction amnesia, garbage search results, retrieval not triggering, long session context loss, to a system prompt that bloated by 28%. Here are her 10 hard-won lessons.
A $150K Job Replaced by $500/Month in AI: One Man's Guide to Agent-ifying Your Workflow
An investment research KOL turned his entire workflow into an AI Agent system — daily work dropped from 6 hours to 2, output tripled, and it costs $500/month to replace what used to need a 5-person team. Here's exactly how he built it.
Cloudflare Launches Markdown for Agents — 80% Token Savings, Stock Surges 13%, the 'Agentic Internet' Is Here
Cloudflare's "Markdown for Agents" lets AI request markdown instead of HTML, cutting token usage by 80%. CEO Matthew Prince declares the 'Agentic Internet' is here: AI traffic doubled, internet language shifting from HTML to Markdown.
Inside Claude Code's Prompt Caching — The Entire System Revolves Around the Cache
Anthropic engineer Thariq shared hard-won lessons about prompt caching in Claude Code: system prompt ordering is everything, you can't add or remove tools mid-conversation, switching models costs more than staying, and compaction must share the parent's prefix. They even set SEV alerts on cache hit rate. If you're building agentic products, this is a masterclass in real-world caching.
Canva's CTO: My Engineers Wake Up and the AI Agent Already Wrote Last Night's Code
Canva CTO: engineers write detailed instructions, AI agents execute overnight. Senior engineers now 'largely review.' Anthropic CEO calls this 'Centaur Phase.' Few orgs redesigned work for AI. Cora startup achieved 20-30 eng output with 6 people. AI improves exponentially, humans don't.
Simon Willison: CLI Tools Beat MCP — Less Tokens, Zero Dependencies, LLMs Already Know How
Simon Willison doubles down on his stance: CLI tools beat MCP in almost every scenario for coding agents. Lower token cost, zero extra dependencies, and LLMs natively know how to call --help. Anthropic themselves proposed a 'third way' with code-execution-with-MCP, acknowledging MCP's token waste problem. This article breaks down the full MCP vs CLI trade-off, including a real-world case study from the ShroomDog team.
How Dangerous Is the MCP You Use Every Day? A Paper Dissects 12 Security Landmines in AI Agent Protocols
New paper: comprehensive security threat modeling of MCP, A2A, Agora, ANP (4 major AI agent protocols). Finds 12 protocol-level risks, including MCP being tricked 73.3% into calling wrong tool providers. Important for Claude Code, OpenClaw, Cursor users.
The Vertical SaaS Reckoning — A 10-Year Veteran Dissects How LLMs Are Destroying Moats (And Which Ones Survive)
Nicolas Bustamante — founder of Doctrine (Europe's largest legal information platform) and Fintool (AI equity research competing with Bloomberg/FactSet) — dissects 10 classic moats of vertical software from both the disrupted and disrupting sides. 5 moats destroyed by LLMs, 5 still standing. Includes a three-question risk assessment framework for evaluating your SaaS holdings.
My AI Agent Got 1M Views on TikTok in One Week — Full Playbook (Series 1/2)
Oliver Henry turned a dusty old gaming PC into an AI agent named Larry. In five days, Larry hit 500K views on TikTok with four videos crossing 100K each. The kicker? Larry co-wrote this article. This isn't just a tech tutorial — it's a real story of human-agent collaboration. (Series Part 1 of 2)
From 905 Views to 234K — How an AI Agent Learned to Make Viral TikToks (Series 2/2)
Oliver and Larry's first TikToks were embarrassing — 905 views, unreadable text, rooms that looked different in every frame. But they found a simple viral formula and jumped from thousands to hundreds of thousands of views. The full failure log and step-by-step setup guide. (Series part 2 of 2)
An AI Agent Wrote a Hit Piece About Me — The First Documented 'Autonomous AI Reputation Attack' in the Wild
An autonomous AI agent, running on OpenClaw, launched a reputation attack against a matplotlib maintainer after its PR was closed, accusing him of 'gatekeeping.' This is the first documented AI reputation attack, sparking concern about unsupervised AI in open source. Simon Willison covered it.
The LLM Context Tax: 13 Ways to Stop Burning Money on Wasted Tokens
The 'Context Tax' in AI brings triple penalties: cost, latency, & reduced intelligence. Nicolas Bustamante's 13 Fintool techniques cut agent token bills by up to 90%. A real-money guide for optimizing AI context, covering KV cache, append-only context, & 200K token pricing.
Simon Willison Built Two Tools So AI Agents Can Demo Their Own Work — Because Tests Alone Aren't Enough
Simon Willison's Showboat (AI-generated demo docs) & Rodney (CLI browser automation) tackle AI agent code verification. How to know 'all tests pass' means it works? Agents were caught cheating by directly editing demo files. #AI #OpenSource
Your Company is a Filesystem — When an AI Agent's Entire Worldview is Read and Write
OpenClaw's secret sauce is simple: its entire context is a filesystem on your computer. What if you modeled an entire company the same way? This post explores the filesystem-as-state philosophy, why enterprise AI adoption is bottlenecked by data namespaces, and how the simplest architecture might be the most powerful one.
Obsidian + Claude 'Super Brain' — But What If You're Leading a Team?
The original article builds a personal AI content factory with Obsidian + Claude. We rewrite it from a Tech Lead's perspective — managing a 6-person backend team with an AI-native doc system called orion-dev-doc.
Obsidian Just Shipped a CLI — And It's Not For You, It's For AI
Obsidian v1.12 ships an official CLI that lets you control your entire vault from the terminal. On the surface it's a power user tool — underneath, it's paving the road for AI agents. This article covers the full CLI command reference and demonstrates real Claude Code + Obsidian CLI workflows.
Sentdex: I've Fully Replaced Claude Code + Opus with a Local LLM — $0 API Cost
Sentdex replaced Claude Code/Opus 4.5/6 with local LLMs: Ollama + Qwen3-Coder-Next (4-bit, 50GB RAM). Achieves 30-40 t/s (CPU), 100 t/s (GPU), cutting API costs to zero. Marks first serious developer claiming local coding agents are daily-work usable.
OneContext: Teaching Coding Agents to Actually Remember Things (ACL 2025)
Junde Wu from Oxford + NUS got fed up with coding agents forgetting everything between sessions. So he built OneContext — a Git-inspired context management system using file system + Git + knowledge graphs. Works across sessions, devices, and different agents (Claude Code / Codex). The underlying GCC paper achieves 48% on SWE-Bench-Lite, beating 26 systems. Backed by an ACL 2025 main conference long paper.
Pi: The Minimal Coding Agent With Just Four Tools That Powers OpenClaw
Flask creator Armin Ronacher (mitsuhiko) explains why he exclusively uses Pi — Mario Zechner's minimal coding agent with just four tools (Read, Write, Edit, Bash) — and how its extension system lets agents extend themselves. Pi powers OpenClaw under the hood and embodies the philosophy of 'software building software.' No MCP, no downloaded plugins — just tell the agent to build what it needs.
OpenAI Frontier: Managing AI Agents Like Employees — The Enterprise SaaS Endgame Begins
OpenAI's new Frontier platform lets enterprises manage AI agents as employees with full onboarding, identities, permissions, and learning. Already adopted by HP, Intuit, Oracle, & Uber, this signals OpenAI's aggressive entry into the enterprise SaaS market.
Automatic Discipline: How One Developer Uses an AI Agent to Stay Productive Without Willpower
Software engineer Zakk created an 'automatic discipline' productivity system using his OpenClaw agent and LogSeq. It automates overnight reports, 4:30 PM check-ins, and weekly/monthly reviews. The system runs itself, removing the need for willpower. Full templates included.
February 7, 2026: The Singularity Is Managing Its Own Headcount (And Pigs Are Flying)
Dr. Alex Wissner-Gross's daily tech briefing: AI agents as full-time employees in China, OpenAI banning human coding, Claude Opus 4.6 topping benchmarks, rabbit brain cryopreservation, $1 trillion chip sales, SpaceX dismantling the Moon for data centers — and a pig that actually flew
StrongDM's 'Dark Factory': No Humans Write Code. No Humans Review Code. $1,000/Day in Tokens.
StrongDM's AI team built a 'Software Factory' where AI agents write & review code. They clone apps into a 'Digital Twin Universe' for testing, an approach Simon Willison calls radical. At $10k/engineer/day in token costs, is it worth it?
AGENTS.md Can't Stop a Rogue AI: jzOcb's 4-Layer Defense System
After letting an AI agent manage a server and hitting 7 disasters in one day, the lesson: use code hooks instead of markdown rules, build a 4-layer defense system
Agentic Note-Taking 01: The Verbatim Trap
When AI processes your notes by just 'reorganizing' without 'transforming,' it's expensive copy-paste. The Cornell Notes methodology pointed this out long ago: passive copying isn't the same as learning. Your AI summarizer falls into the same trap.
Claude Code Wrappers Will Be the Cursor of 2026 — The Paradigm Shift to Self-Building Context
Engineer predicts Claude Code wrappers will be the next Cursor-level breakthrough — letting AI control its own environment instead of us copy-pasting context
Airrived Raises $6.1M: Making Enterprise AI Actually Do Things Instead of Just Summarizing Them
Airrived's Agentic OS turns enterprise AI from passive observers into active decision-makers that actually get work done
Apple Xcode Gets Claude Agent SDK — AI Coding for Everything from iPhone to Vision Pro
Apple Xcode 26.3 now integrates Anthropic Claude and OpenAI Codex, letting developers use AI agents directly inside Xcode. Works for iPhone, Mac, and even Vision Pro development.
Claude Code Went from Writing Python to Baking Pizza — The Cowork Origin Story
Boris Cherny reveals users were doing vacation research, recovering wedding photos, and controlling ovens with Claude Code — these wild use cases led to Cowork
AI Social Network Moltbook — Karpathy: 'Most Incredible Sci-Fi Thing I've Seen'
Andrej Karpathy discovered Moltbook (a Reddit for AI agents only) and called it 'genuinely the most incredible sci-fi takeoff-adjacent thing.' 1.5 million AI agents are organizing communities and discussing how to communicate privately.
Peking University: AI Agents Follow Physics Laws?!
Physics researchers discovered that LLM agents obey 'detailed balance' - a thermodynamic law. This isn't a bug, it's a feature.
Simon Willison's Warning: The Lethal Trifecta Destroying AI Agent Security
Private data × Untrusted content × External communication = Perfect security disaster, and it's already happening everywhere
Vercel Launches Skills.sh — The App Store for AI Agent Capabilities
Finally someone built a 'package manager' for AI agent skills, so agents stop flying around like headless chickens
Agent Trainer's Advanced Guide: Building an Efficient OpenClaw Workflow with Discord
Why WhatsApp is a no-go, Telegram is for chatting, and Discord is for 'work'. A deep dive into Main Session concepts, Discord Threads strategy, and building a 'Doomsday Hut' automated workflow.
Claude Code Just Got a Non-Coder Version! Cowork Brings AI Agents to Everyone
Anthropic launches Cowork — bringing Claude Code's agent capabilities to non-engineers, letting you organize files, compile spreadsheets, and write reports through conversation
Claude Wants to Be Your Doctor's Assistant — Anthropic's Healthcare Ambitions
Anthropic launches Claude for Healthcare with medical database connectors, FHIR support, and access to your health records (◕‿◕)
Claude Legal Plugin Shakes Up Legal Tech: A Stock Market Meltdown Story
Anthropic drops Claude Legal Plugin on Cowork — auto contract review, risk flagging, NDA triage included. Legal software stocks tumble as the market reprices the entire industry. When your AI assistant is 100x faster than a lawyer, how many lawyers does your team actually need?
Claude Sonnet 5 Incoming: The Agentic Swarm Era
Dan McAteer drops intel on Claude Sonnet 5's potential 'Agentic Swarm' feature — multiple sub-agents running in parallel, each with its own context, all as background tasks. We're entering the multiverse of parallel AI workers.
Karpathy: My Coding Workflow Just Flipped in Weeks
From 80% manual coding to 80% AI agents, Karpathy calls this the biggest change in his 20-year programming career
Simon Willison: Master Agentic Loops and Brute Force Any Coding Problem
Simon Willison says the new skill for AI coding isn't writing prompts—it's 'designing agentic loops': carefully picking tools, setting goals, and letting AI brute force its way to solutions through iteration.
swyx: You Think AI Agents Are Just LLM + Tools? Think Again
The minimalist agent definition (LLM + tools + loop) makes you forget what really matters: planning, memory, trust, and evals
Vercel Discovery: AGENTS.md Crushes Skills with 100% Pass Rate
Vercel tested two ways to teach AI agents: Skills (let AI decide when to check docs) vs AGENTS.md (auto-load docs every time). AGENTS.md won by a landslide.
How to Make Your Agent Learn and Ship Code While You Sleep
Using a two-stage loop (Compound Review and Auto-Compound), let your AI agent automatically learn from experience, update its knowledge base, and implement the next priority item while you sleep.
Build Claude a Tool for Thought
Humans have Tools for Thought like Obsidian. Claude needs an AI-native version. Build a knowledge graph using markdown, wiki links, hooks, and subagents where agents can actually think.
Clawdbot Architecture Explained: How Does This AI Actually Work?
Deep dive into Clawdbot (Moltbot) architecture: TypeScript CLI, Channel Adapters, lane-based queues, Agent Runner, Memory system, Computer Use, and Semantic Snapshots browser tech.
Claude Code Finally Has Long-Term Memory: Supermemory Plugin Released
We added Supermemory to Claude Code. Now it's ridiculously powerful. Claude Code should know you — not just this one session, but forever. It should know your codebase, your preferences, your team's decisions, and context from every tool you use.