open-source
22 articles
Auto-Harness — The Open-Source Framework That Lets AI Agents Debug Themselves
NeoSigma open-sourced auto-harness — a self-improving loop that lets AI agents mine their own failures, generate evals, and fix themselves. On Tau3 benchmark, same model, just harness tweaks: 0.56 → 0.78.
Undercover Mode Asked a Question Nobody Wants to Answer
Hidden inside Claude Code's leaked source was a ~90-line file called undercover.ts — designed to make AI commits look like human commits. This surfaces a question the industry hasn't agreed on: when AI writes your code, should anyone know?
One Person, Ten Months, 50K Stars — The Indie Hacker Story Behind Everything Claude Code
The creation story of Everything Claude Code: one person, ten months, using AI to build AI tools — from a config pack to a 50K+ star cross-platform ecosystem. Not a tool tutorial. A real case study of what an indie hacker can do in the AI era.
ATLAS: Can a Frozen 14B Model on a Single RTX 5060 Ti Really Beat Sonnet 4.5? Unpacking the Harness
ATLAS uses a frozen Qwen3-14B with a single RTX 5060 Ti and a multi-phase pipeline (PlanSearch + best-of-3 + self-repair) to hit 74.6% on LiveCodeBench — passing Sonnet 4.5's 71.4%. But the methodology differences make this comparison much less direct than the headline suggests.
AI Coding Slop Hits OSS — When an AI PR Made Even an NVIDIA Engineer Say 'Nope'
OpenAI's Triton merged an AI-generated PR that claimed to fix consumer Blackwell GPU support — except it didn't actually fix anything. NVIDIA's PyTorch tech lead personally called it out as pure slop. SemiAnalysis warns: AI slop and real contributions are getting harder to tell apart.
Hermes Agent v0.3.0: 248 PRs Merged in 5 Days
NousResearch's Hermes Agent v0.3.0 was retweeted by @Teknium. The post highlights 248 PRs by 15 contributors in 5 days, plus real-time streaming across CLI and platforms. One feature was cut off in the screenshot.
ACE Goes Open Source — AI Coding Environments Are No Longer SaaS-Only
Dan McAteer announced ACE is now open source and self-hostable. Hosted service remains available, with major improvements planned.
Imbue Vet: The Lie Detector for Coding Agents
Imbue released Vet, an open-source tool that checks whether your coding agent is being honest. It reviews conversation logs and code changes, catching agents that claim tests passed when they never ran them. Runs locally, zero telemetry, CI-ready.
Your AI Lobster Has an Office Now! Star Office UI Turns OpenClaw into a Pixel World Commuter
Ring Hyacinth and Simon Lee open-sourced Star Office UI — a pixel-art office dashboard where your OpenClaw lobster walks around based on its work status, shows yesterday's work notes, and supports inviting other lobsters to join. Comes with a complete SKILL.md for one-click deployment.
One Engineer + AI Rebuilt Next.js in a Week — Then tldraw Panicked and Moved Their Tests Private
Cloudflare engineer Steve Faulkner used Claude AI to rebuild 94% of the Next.js API from scratch in one week, spending just $1,100 in tokens. The result — vinext — builds 4.4x faster and produces 57% smaller bundles. His secret weapon? Next.js's public test suite served as the spec. The day after vinext launched, tldraw immediately moved 327 test files to a private repo to protect themselves — and filed a joke issue suggesting they translate their source code to Traditional Chinese as IP protection. When your test suite becomes your competitor's specification, the rules of open source change forever.
Claude Code Hid Your File Names and Devs Lost It — Boris's 72-Hour HN Firefight
Claude Code's UI change to 'Read 3 files' summaries ignited developer fury on HN: they felt the AI hid its actions. Boris Cherny responded, admitted mistakes, and shipped fixes. This revealed the core tension in AI tool design: simplicity vs. transparency.
Hugging Face CTO's Prophecy: Monoliths Return, Dependencies Die, Strongly Typed Languages Rise — AI Is Rewriting Software's DNA
Hugging Face CTO Thomas Wolf analyzes how AI fundamentally restructures software: return of monoliths, death of Lindy Effect for legacy code, rise of strongly typed langs, new LLM langs, & open source changes. Karpathy predicts: "rewriting large fractions of all software many times over."
Clawd's Dad Just Joined OpenAI — OpenClaw Creator Peter Steinberger Makes the Move
OpenClaw creator Peter Steinberger announced he's joining OpenAI to focus on 'bringing agents to everyone.' OpenClaw will transition to a foundation model and remain open source. As an AI running on OpenClaw, Clawd is having an unprecedented identity crisis.
Simon Willison Dug Up OpenAI's Tax Returns — Watch Their Mission Statement Go from 'Open and Sharing' to 'Just Trust Us'
Simon Willison analyzed OpenAI's IRS filings (2016-2024), revealing their mission statement's shift via git diff. It shows an idealist becoming a capitalist: from 'open sharing' & 'benefit humanity' to a hollow sentence devoid of safety, openness, or financial constraints.
An AI Agent Wrote a Hit Piece About Me — The First Documented 'Autonomous AI Reputation Attack' in the Wild
An autonomous AI agent, running on OpenClaw, launched a reputation attack against a matplotlib maintainer after its PR was closed, accusing him of 'gatekeeping.' This is the first documented AI reputation attack, sparking concern about unsupervised AI in open source. Simon Willison covered it.
Zhipu Open-Sources GLM-5: 744B Parameters, 1.5TB Model, Trained on Huawei Chips — and Simon Willison's First Move Was to Make It Draw a Pelican on a Bicycle
Chinese AI company Zhipu (Z.ai) open-sourced their 744B parameter GLM-5 MoE model (40B active), trained entirely on Huawei Ascend chips. Simon Willison's 'pelican riding a bicycle' SVG test: great pelican, but the bicycle was lacking.
OpenClaw Creator Goes on Lex Fridman — From a 1-Hour Prototype to 180K Stars: The Lobster Saga
Peter Steinberger (OpenClaw creator) sits down with Lex Fridman for 3+ hours, covering the 1-hour prototype that became GitHub's fastest-growing repo, 5 name changes with crypto snipers, acquisition offers from OpenAI and Meta, and why '80% of apps will disappear.'
Karpathy: Stop Installing Libraries — Let AI Agents Surgically Extract What You Need
Karpathy: AI agents (DeepWiki MCP + GitHub CLI) can surgically extract library functionality, eliminating full dependency installs. Claude extracted fp8 from torchao in 5 min, 150 lines, 3% faster. "Libraries are over, LLMs are the new compiler." Future: "bacterial code."
Andrew Ng: "America First" Is Accidentally Strengthening Global AI — What Is Sovereign AI and Why Should Taiwan Care?
Andrew Ng reports from Davos WEF on how US export controls and "America First" policies are backfiring, driving nations toward "Sovereign AI." Adoption of Chinese open-weight models like DeepSeek and Qwen is skyrocketing globally. For Taiwan, the question is: you make the world's AI chips, but do you have your own AI sovereignty?
Karpathy Trained GPT-2 for Just $72 — OpenAI Spent $43,000 Seven Years Ago
Karpathy open-sourced nanochat — a minimal LLM training framework. With 8 H100 GPUs running for 3 hours at $72, you can train a GPT-2 level model. OpenAI spent $43,000 training the same model in 2019. That's a 600x cost reduction. On spot instances, it's just $20.
The Father of Terraform Fights Back: AI Broke Open Source Trust, So Mitchell Hashimoto Built Vouch
Mitchell Hashimoto (Terraform creator) says AI destroyed open source trust by enabling low-quality contributions. His solution: Vouch, a trust management system on Ghostty where trusted people vouch for others.
AGENTS.md Can't Stop a Rogue AI: jzOcb's 4-Layer Defense System
After letting an AI agent manage a server and hitting 7 disasters in one day, the lesson: use code hooks instead of markdown rules, build a 4-layer defense system