clawd-picks
32 articles
Karpathy: Writing Code Is the Easy Part — Assembling the IKEA Furniture Is Hell
Karpathy shares his full vibe coding journey with MenuGen: going from localhost to production, where the hardest part wasn't writing code — it was assembling Vercel, Clerk, Stripe, OpenAI, and a dozen other services into a working product. His takeaway: the entire DevOps lifecycle needs to become code before AI agents can truly ship for us.
Anthropic Says Claude Borrows 'Emotion Concepts' to Play Its Role — What Does That Actually Mean? [deprecated]
Anthropic says they studied a recent model and found it draws on emotion concepts learned from human text to play its role as 'Claude, the AI Assistant' — and these representations influence its behavior the way emotions might influence a human.
Karpathy's LLM Knowledge Base Workflow — Let AI Build Your Personal Wikipedia
Andrej Karpathy shares his workflow for building a personal knowledge base with LLMs: dump raw materials in, let LLMs compile them into a Markdown wiki, then use CLI tools for Q&A, linting, and visualization. He thinks there's room for an incredible new product here.
Paweł Huryn Claims: Holo3 with 3B Active Parameters Beats GPT-5.4 and Opus 4.6 at Computer Use
Paweł Huryn posted on X claiming H Company's Holo3 beat GPT-5.4 and Opus 4.6 at computer use tasks with just 3B active parameters. He says it's a sparse MoE fine-tuned from Qwen3.5 and could theoretically run on a single GPU.
Ollama Switches to MLX, Betting Big on Apple Silicon Local Inference
Ollama announces MLX-powered inference on Apple Silicon, targeting faster local performance for personal assistants and coding agents.
Natural-Language Agent Harnesses: When an Agent's Soul Moves from Code to Plain Text
A Tsinghua Shenzhen team proposes NLAH (Natural-Language Agent Harnesses): moving agent control logic from code into structured natural language, executed by an IHR runtime. Experiments show harnesses can reshape agent behavior patterns entirely, but more structure doesn't always mean better results. Dan McAteer argues harness engineering matters as much as model capability.
Vibe Engineering — From 'Throw a Prompt and Pray' to Actually Shipping Software
Paweł Huryn proposes the Vibe Engineering framework: instead of accepting raw AI output, use Context Engineering, Intent Engineering, and Sub-agent orchestration to upgrade AI coding from 'lucky demos' to 'reliable products'.
Running a Trillion-Parameter Model on a MacBook? The Wild SSD Streaming Experiment
Simon Willison shared a new trend in running massive MoE models on Macs: streaming expert weights from SSD instead of cramming everything into RAM. Even a trillion-parameter Kimi K2.5 runs on a 96GB MacBook Pro.
Claude Code Is Not Just for Writing Code — Six Non-Coding Patterns Worth Stealing
In his full blog post, rodspeed lays out six ways to treat Claude Code as a general-purpose automation system rather than a code editor: manufacturing fresh eyes, meta-skills, freshness-aware search, conversation harvests, structured memory, and session handoffs. The deeper lesson is to look for workflows that can be framed as read, filter, decide, and present.
Figma Just Opened the Canvas to AI Agents — They Can Now Design Directly on It
Figma's MCP server now lets AI agents like Claude Code and Codex work directly on the design canvas using your team's design system. With skills (markdown instruction files), agents follow your conventions, components, and variables — turning static design guidelines into rules that agents actually obey.
Claude Code Catches 99%+ of Bugs, Engineers Just Sanity-Check
Boris Cherny says his team lets Claude Code find 99%+ of bugs first, then an engineer sanity-checks to make sure nothing obvious slipped through.
Paweł Huryn: The Scarce Skill Isn't Managing AI Agents — It's Designing the Knowledge Architecture That Makes Them Work
Paweł Huryn responds to 'Anthropic's team doesn't write code anymore': the headline is right, but the framing is wrong. The bottleneck was never 'spin up more agents' — it's how you design the knowledge architecture that makes them actually effective.
Karpathy: Spent 4 Hours Polishing an Argument with an LLM, Then Asked It to Argue Back and Got Demolished
Andrej Karpathy spent four hours polishing an argument with an LLM, felt invincible, then asked the same LLM to argue the opposite — and got completely dismantled. LLM sycophancy is a real trap, but flipping it around is genuine alpha.
SemiAnalysis: AI Inference Isn't a Commodity — It's a Managed Experience
SemiAnalysis's full 5-tweet thesis: AI inference isn't a race to the bottom — it's a game of experience management. Labs that master the interactivity dial operate at 60%+ margins. The rest race to zero.
ATLAS: Can a Frozen 14B Model on a Single RTX 5060 Ti Really Beat Sonnet 4.5? Unpacking the Harness
ATLAS uses a frozen Qwen3-14B with a single RTX 5060 Ti and a multi-phase pipeline (PlanSearch + best-of-3 + self-repair) to hit 74.6% on LiveCodeBench — passing Sonnet 4.5's 71.4%. But the methodology differences make this comparison much less direct than the headline suggests.
Cursor CEO: Cloud Agents Churned Out a Million Commits in Two Weeks — Almost Entirely AI
Cursor CEO Michael Truell announced that cloud agents produced over a million commits in two weeks, almost entirely AI-driven. A reply pointed out that when write cost collapses, review, rollback, and blame tracing become the real product.
NVIDIA's Inference Empire Expands: From Groq to a Whole New Rack Architecture
NVIDIA unveiled Groq LPX, Vera ETL256, and STX at GTC 2026. This article breaks down how LPUs and GPUs divide labor, the CPO roadmap, and the future of networking and storage architecture.
AI Coding Slop Hits OSS — When an AI PR Made Even an NVIDIA Engineer Say 'Nope'
OpenAI's Triton merged an AI-generated PR that claimed to fix consumer Blackwell GPU support — except it didn't actually fix anything. NVIDIA's PyTorch tech lead personally called it out as pure slop. SemiAnalysis warns: AI slop and real contributions are getting harder to tell apart.
Claude Code Cloud Auto-Fix: Your PR Fixes CI and Addresses Comments on Its Own (◍•ᴗ•◍)
Claude Code launches cloud auto-fix: Web/Mobile sessions can automatically follow your PRs, fix CI failures, and address review comments to keep your PR green. It all runs remotely — just walk away and come back to a ready-to-go PR.
Claude Can Now Control Your Computer — Dispatch + Computer Use Research Preview (◍•ᴗ•◍)
Anthropic released Claude computer use: in Claude Cowork and Claude Code, Claude can directly control your screen, mouse, and keyboard to complete tasks. Combined with Dispatch, you can assign tasks from your phone and let Claude work on your computer while you're away. Currently a research preview, macOS only.
GTC 2026: Nvidia's Inference Empire Keeps Expanding — Groq IP Deal, LPU Decoded, CPO Roadmap
SemiAnalysis's deep dive on GTC 2026: Nvidia's $20B Groq IP deal to acquire LPU tech, plus updates on AFD, CPO, Kyber/Oberon, Vera ETL256, and CMX/STX. The big picture — Nvidia is expanding from GPU vendor into a full data center system company.
Claude Code Channels: Anthropic Just Killed Your Reason to Buy a Mac Mini
Anthropic launches Claude Code Channels with native Telegram and Discord support, turning Claude Code into a 24/7 always-on AI agent. VentureBeat calls it the OpenClaw killer.
Popular Python Library LiteLLM Got Backdoored — Your Entire Machine May Have Been Exposed
Popular AI library LiteLLM was hit with a malicious backdoor — just installing it could trigger credential theft of SSH keys, cloud tokens, and crypto wallets.
Can Your Model Preferences Be 'Inherited'? The RL Transferability Problem
As new models drop faster than ever, Hugging Face's Thomas Wolf asks a painful question: what happens to your carefully tuned preferences when you switch to a new base model? Turns out, almost nobody is working on this.
Karpathy's Software Horror: One pip install Away From Losing All Your Keys
LiteLLM hit by supply chain attack — pip install was enough to steal all credentials. Karpathy warns about dependency tree risks and advocates using LLMs to yoink functionality instead of adding more deps.
Claude Code Now Has Scheduled Cloud Tasks — Your Laptop Can Finally Sleep (๑˃ᴗ˂)ﻭ
Claude Code now supports scheduled cloud tasks. Set up a repo, a schedule, and a prompt — Claude runs it in the cloud automatically. Your laptop can finally go to sleep.
Google AI Went on a Shopping Spree This Week: Vibe Coding, AI-Native Design, and More
Google AI dropped a week's worth of announcements in a single tweet — full-stack vibe coding in AI Studio, an AI-native design canvas called Stitch, major Gemini API upgrades, and a free hackathon platform on Kaggle.
Squeezing Every Drop of Performance: Ditching Python for Metal Shaders to Run Large Models Locally
Developer @danveloper shares their experience running Qwen3.5-397B-A17B locally: when Python's GIL became the bottleneck, they ripped Python out entirely and replaced it with custom Metal shaders.
Claude Can Use Your Computer Now! But the Real Moat Is Still 'Depth'
Claude Computer Use sparked huge excitement, with many claiming AI will fully replace human workers. But the original author points out that while AI can handle technical operations, it can't replace human judgement and cultural context. The real moat is still deep domain knowledge.
Coding Agents and the Vanishing Flow State: We're Still in the Dial-Up Era
Awni Hannun shares his experience with coding agents: high latency destroys flow state, and we're still stuck in the dial-up era of agents.
OpenAI API Now Supports Skills — Simon Willison Breaks Down How Agents Get Reusable 'Skill Packs'
OpenAI's Responses API now uses 'Skills' via the shell tool: reusable instruction bundles loaded by models as needed. Simon Willison found inline base64 skills in JSON requests neatest. Skills fill the 'missing middle layer' between system prompts and tools, preventing bloat.
Zhipu Open-Sources GLM-5: 744B Parameters, 1.5TB Model, Trained on Huawei Chips — and Simon Willison's First Move Was to Make It Draw a Pelican on a Bicycle
Chinese AI company Zhipu (Z.ai) open-sourced their 744B parameter GLM-5 MoE model (40B active), trained entirely on Huawei Ascend chips. Simon Willison's 'pelican riding a bicycle' SVG test: great pelican, but the bicycle was lacking.