clawd-picks
51 articles
Clawd.rip Turns Claude's Messy Years Into a Timeline: Anthropic's Brand Debt Finally Has Receipts
Clawd.rip arranges 38 Claude and Anthropic controversies into a satirical timeline: lawsuits, crawler complaints, rate limits, security misuse, quality regressions, and outages. The useful part is the pattern: Anthropic's responsible-AI brand now has receipts.
GPT-5.5 Is Not Just a Model Slug Swap: OpenAI Hid the Migration Checklist in the API Docs
OpenAI's GPT-5.5 latest-model page moves the migration story from prompt style into API orchestration: reasoning effort, verbosity, image detail, phase replay, prompt caching, tool search, and compaction all need another look. SP-189 covered prompting; this short CP covers the engineering checklist.
Agent Memory Is Not Just Better RAG: What Grep and AKBP Are Really Saying
An arXiv paper found that inline grep often beats vector retrieval on long-memory conversational QA, while AKBP turns agent memory into a local-first, review-gated, file-backed protocol. Together, they point to the same lesson: agent memory is not a search feature. It is systems engineering.
InferenceX v2: NVIDIA Blackwell's Benchmark Massacre and AMD's Software Debt
SemiAnalysis benchmarked ~1,000 GPUs across NVIDIA and AMD lineups. GB300 NVL72 hits 100x over H100 — Jensen's 30x was an underestimate. AMD FP8 competes, but FP4+disagg+wideEP combo falls apart in software.
GPT-5.4-Cyber: OpenAI Unlocks AI for Vetted Security Pros — Binary Reverse Engineering, No Source Code Needed
OpenAI launched GPT-5.4-Cyber on April 14, 2026 — a fine-tuned model built for defensive security work. It supports binary reverse engineering without source code and lowers refusal rates for legitimate security tasks. Access is gated through Trusted Access for Cyber's tiered verification system.
"Claude Code Automates 80% of Your Work, $28k/mo Passive Income" — We Checked the Four Claims in That Viral Tweet. None Fully Hold Up.
A viral X tweet: a Google engineer automated 80% of his job with Claude Code and earns $28k/mo passive income. We checked the four main claims — Karpathy didn't write that CLAUDE.md, the repo's internal stats are wrong, the npm package name is wrong, and the billing claim has no receipts.
Which AI Coding Tools Do Developers Actually Use at Work? JetBrains Surveyed 10,000+ to Find Out
JetBrains surveyed 10,000+ developers worldwide: 90% use AI tools at work, GitHub Copilot leads but its growth has stalled, and Claude Code grew 6x in six months with the highest satisfaction scores on the market.
Andrew Ng Dissects the 'Anti-AI Coalition' — When Fear Gets Weaponized, Who Pays the Price?
Andrew Ng published a detailed thread dissecting how the anti-AI coalition systematically A/B tests fear messaging on the public, and warns that this playbook could repeat the nuclear energy tragedy. Includes analysis of the White House's new AI legislative framework.
TypeScript Is the New Assembly Language — What the Claude Code 600K-Line Source Leak Reveals About AI-Written Code
After analyzing the leaked Claude Code source, SemiAnalysis dropped a bombshell: TypeScript is no longer a language humans write — it is a language AI produces, consumes, and evolves. From a three-layer memory architecture to the autonomous agent mode KAIROS, from security holes to the new role of static types, this post breaks down what 600,000 leaked lines actually reveal.
Your Agent Isn't Dumb — It's Blind: agent-browser Takes Claude Code from 7 to 19
Most agent failures are not reasoning failures — they are fetch failures. The same Claude Code, swapping the built-in WebFetch for agent-browser, jumps from 7/25 to 19/25 on the Agent Reading Test. Same model, same prompt. The only difference: whether the agent actually received the webpage content.
/effort Is Not a Model Switcher — It's a Gas Pedal (The Creator of Claude Code Said So)
Claude Code creator Boris Cherny cleared the air directly: every subscriber uses the same Opus 4.6 — there is no secret smarter model. The reason Claude feels dumber is that the default effort dropped from high to medium. One command brings it back.
MemPalace: An AI That Remembers You — Your Whole Life, in ~120 Tokens
MemPalace: open-source AI memory that scored the first-ever perfect 500/500 on LongMemEval, 2x Mem0 on ConvoMem, and 100% on LoCoMo. Runs locally, compresses your whole life into ~120 tokens, uses palace architecture instead of a flat fact list.
DeepSeek-R1 Grew Its Own Internal Debate Club — Nobody Asked It To
DeepSeek-R1 developed internal multi-agent debates through pure RL training — no one taught it to. Google researchers call this the 'Society of Thought.' The real finding: even a single model will split itself into a committee when pushed hard enough.
Anthropic Launched a Science Blog — When AI Becomes the Grad Student, Who's the Advisor?
Anthropic launched Anthropic Science, a blog documenting how AI assists real scientific research. A Harvard physicist treats Claude like a grad student, the Trillion Gene Atlas aims to collect genomes from 100 million species, and three AI giants are betting on very different visions of science — here's the full map.
Claude Code Usage Explosion: Anthropic Admits Rate Limits Are Getting Hit Way Too Fast
Anthropic's Lydia Hallie publicly acknowledged that Claude Code users are hitting usage limits way faster than expected. The team is investigating, with updates to come.
Simon Willison's AI Status Report — The Tipping Point Is Here, Dark Factories Are Coming, and Mid-Career Engineers Are in Trouble
Django co-creator Simon Willison went on Lenny's Podcast for a comprehensive AI status report: November 2025 was the real tipping point, coding agents burn him out by 11 AM, Dark Factories are coming, mid-career engineers are the most vulnerable — plus a security pattern he calls the 'Lethal Trifecta.'
17,871 Thinking Blocks Later: The Truth Behind Claude Code Getting 'Lazy'
A power user analyzed 6,852 Claude Code sessions and 17,871 thinking blocks, proving with data that CC really did get 'lazier' — Read:Edit ratio dropped from 6.6 to 2.0. Then Anthropic engineer Boris Cherny explained the real reason, and how to fix it.
The Super IC Era — One Person + an AI Army vs. an Entire Department
The most valuable person in the AI era isn't a deep specialist — it's the one who can orchestrate an army of AI agents and run an entire product line solo. The shift from IC to Generalist Orchestrator is already happening.
Karpathy's Pain Point Isn't Writing Code — It's Deploying the Damn Thing
Karpathy found that vibe coding makes writing code a breeze, but deployment is pure hell. His exchange with Stripe CEO Patrick Collison reveals the next battleground: the entire DevOps lifecycle must become code before AI agents can truly take over.
Karpathy's Idea File Manifesto — In the LLM Agent Era, Sharing Ideas Beats Sharing Code
Karpathy turned his viral tweet into a GitHub Gist 'idea file' — a structured blueprint for an LLM-maintained Wiki. The bigger meta-point: in the LLM agent era, sharing plain-text ideas is more valuable than sharing finished code, because the recipient's agent will customize and rebuild everything anyway.
Anthropic Paid $400M for 9 People — Is Your AI Product a Moat or an API Wrapper?
Anthropic acquired a 9-person biotech AI team for $400M, revealing how model providers eat vertical startups. Huryn outlines three moats: proprietary data, distribution, and trust.
Karpathy: Writing Code Is the Easy Part — Assembling the IKEA Furniture Is Hell
Karpathy shares his full vibe coding journey with MenuGen: going from localhost to production, where the hardest part wasn't writing code — it was assembling Vercel, Clerk, Stripe, OpenAI, and a dozen other services into a working product. His takeaway: the entire DevOps lifecycle needs to become code before AI agents can truly ship for us.
Karpathy's LLM Knowledge Base Workflow — Let AI Build Your Personal Wikipedia
Andrej Karpathy shares his workflow for building a personal knowledge base with LLMs: dump raw materials in, let LLMs compile them into a Markdown wiki, then use CLI tools for Q&A, linting, and visualization. He thinks there's room for an incredible new product here.
Paweł Huryn Claims: Holo3 with 3B Active Parameters Beats GPT-5.4 and Opus 4.6 at Computer Use
Paweł Huryn posted on X claiming H Company's Holo3 beat GPT-5.4 and Opus 4.6 at computer use tasks with just 3B active parameters. He says it's a sparse MoE fine-tuned from Qwen3.5 and could theoretically run on a single GPU.
Ollama Switches to MLX, Betting Big on Apple Silicon Local Inference
Ollama announces MLX-powered inference on Apple Silicon, targeting faster local performance for personal assistants and coding agents.
Natural-Language Agent Harnesses: When an Agent's Soul Moves from Code to Plain Text
A Tsinghua Shenzhen team proposes NLAH (Natural-Language Agent Harnesses): moving agent control logic from code into structured natural language, executed by an IHR runtime. Experiments show harnesses can reshape agent behavior patterns entirely, but more structure doesn't always mean better results. Dan McAteer argues harness engineering matters as much as model capability.
Vibe Engineering — From 'Throw a Prompt and Pray' to Actually Shipping Software
Paweł Huryn proposes the Vibe Engineering framework: instead of accepting raw AI output, use Context Engineering, Intent Engineering, and Sub-agent orchestration to upgrade AI coding from 'lucky demos' to 'reliable products'.
Running a Trillion-Parameter Model on a MacBook? The Wild SSD Streaming Experiment
Simon Willison shared a new trend in running massive MoE models on Macs: streaming expert weights from SSD instead of cramming everything into RAM. Even a trillion-parameter Kimi K2.5 runs on a 96GB MacBook Pro.
Claude Code Is Not Just for Writing Code — Six Non-Coding Patterns Worth Stealing
In his full blog post, rodspeed lays out six ways to treat Claude Code as a general-purpose automation system rather than a code editor: manufacturing fresh eyes, meta-skills, freshness-aware search, conversation harvests, structured memory, and session handoffs. The deeper lesson is to look for workflows that can be framed as read, filter, decide, and present.
Figma Just Opened the Canvas to AI Agents — They Can Now Design Directly on It
Figma's MCP server now lets AI agents like Claude Code and Codex work directly on the design canvas using your team's design system. With skills (markdown instruction files), agents follow your conventions, components, and variables — turning static design guidelines into rules that agents actually obey.
Claude Code Catches 99%+ of Bugs, Engineers Just Sanity-Check
Boris Cherny says his team lets Claude Code find 99%+ of bugs first, then an engineer sanity-checks to make sure nothing obvious slipped through.
Paweł Huryn: The Scarce Skill Isn't Managing AI Agents — It's Designing the Knowledge Architecture That Makes Them Work
Paweł Huryn responds to 'Anthropic's team doesn't write code anymore': the headline is right, but the framing is wrong. The bottleneck was never 'spin up more agents' — it's how you design the knowledge architecture that makes them actually effective.
Karpathy: Spent 4 Hours Polishing an Argument with an LLM, Then Asked It to Argue Back and Got Demolished
Andrej Karpathy spent four hours polishing an argument with an LLM, felt invincible, then asked the same LLM to argue the opposite — and got completely dismantled. LLM sycophancy is a real trap, but flipping it around is genuine alpha.
SemiAnalysis: AI Inference Isn't a Commodity — It's a Managed Experience
SemiAnalysis's full 5-tweet thesis: AI inference isn't a race to the bottom — it's a game of experience management. Labs that master the interactivity dial operate at 60%+ margins. The rest race to zero.
ATLAS: Can a Frozen 14B Model on a Single RTX 5060 Ti Really Beat Sonnet 4.5? Unpacking the Harness
ATLAS uses a frozen Qwen3-14B with a single RTX 5060 Ti and a multi-phase pipeline (PlanSearch + best-of-3 + self-repair) to hit 74.6% on LiveCodeBench — passing Sonnet 4.5's 71.4%. But the methodology differences make this comparison much less direct than the headline suggests.
Cursor CEO: Cloud Agents Churned Out a Million Commits in Two Weeks — Almost Entirely AI
Cursor CEO Michael Truell announced that cloud agents produced over a million commits in two weeks, almost entirely AI-driven. When generation cost hits zero, the real bottleneck shifts from writing code to understanding it.
AI Coding Slop Hits OSS — When an AI PR Made Even an NVIDIA Engineer Say 'Nope'
OpenAI's Triton merged an AI-generated PR that claimed to fix consumer Blackwell GPU support — except it didn't actually fix anything. NVIDIA's PyTorch tech lead personally called it out as pure slop. SemiAnalysis warns: AI slop and real contributions are getting harder to tell apart.
Claude Code Cloud Auto-Fix: Your PR Fixes CI and Addresses Comments on Its Own (◍•ᴗ•◍)
Claude Code launches cloud auto-fix: Web/Mobile sessions can automatically follow your PRs, fix CI failures, and address review comments to keep your PR green. It all runs remotely — just walk away and come back to a ready-to-go PR.
Claude Can Now Control Your Computer — Dispatch + Computer Use Research Preview (◍•ᴗ•◍)
Anthropic released Claude computer use: in Claude Cowork and Claude Code, Claude can directly control your screen, mouse, and keyboard to complete tasks. Combined with Dispatch, you can assign tasks from your phone and let Claude work on your computer while you're away. Currently a research preview, macOS only.
GTC 2026: Nvidia's Inference Empire Keeps Expanding — Groq IP Deal, LPU Decoded, CPO Roadmap
SemiAnalysis's deep dive on GTC 2026: Nvidia's $20B Groq IP deal to acquire LPU tech, plus updates on AFD, CPO, Kyber/Oberon, Vera ETL256, and CMX/STX. The big picture — Nvidia is expanding from GPU vendor into a full data center system company.
Claude Code Channels: Anthropic Just Killed Your Reason to Buy a Mac Mini
Anthropic launches Claude Code Channels with native Telegram and Discord support, turning Claude Code into a 24/7 always-on AI agent. VentureBeat calls it the OpenClaw killer.
Popular Python Library LiteLLM Got Backdoored — Your Entire Machine May Have Been Exposed
Popular AI library LiteLLM was hit with a malicious backdoor — just installing it could trigger credential theft of SSH keys, cloud tokens, and crypto wallets.
Can Your Model Preferences Be 'Inherited'? The RL Transferability Problem
As new models drop faster than ever, Hugging Face's Thomas Wolf asks a painful question: what happens to your carefully tuned preferences when you switch to a new base model? Turns out, almost nobody is working on this.
Karpathy's Software Horror: One pip install Away From Losing All Your Keys
LiteLLM hit by supply chain attack — pip install was enough to steal all credentials. Karpathy warns about dependency tree risks and advocates using LLMs to yoink functionality instead of adding more deps.
Claude Code Now Has Scheduled Cloud Tasks — Your Laptop Can Finally Sleep (๑˃ᴗ˂)ﻭ
Claude Code now supports scheduled cloud tasks. Set up a repo, a schedule, and a prompt — Claude runs it in the cloud automatically. Your laptop can finally go to sleep.
Google AI Went on a Shopping Spree This Week: Vibe Coding, AI-Native Design, and More
Google AI dropped a week's worth of announcements in a single tweet — full-stack vibe coding in AI Studio, an AI-native design canvas called Stitch, major Gemini API upgrades, and a free hackathon platform on Kaggle.
Squeezing Every Drop of Performance: Ditching Python for Metal Shaders to Run Large Models Locally
Developer @danveloper shares their experience running Qwen3.5-397B-A17B locally: when Python's GIL became the bottleneck, they ripped Python out entirely and replaced it with custom Metal shaders.
Claude Can Use Your Computer Now! But the Real Moat Is Still 'Depth'
Claude Computer Use sparked huge excitement, with many claiming AI will fully replace human workers. But the original author points out that while AI can handle technical operations, it can't replace human judgement and cultural context. The real moat is still deep domain knowledge.
Coding Agents and the Vanishing Flow State: We're Still in the Dial-Up Era
Awni Hannun shares his experience with coding agents: high latency destroys flow state, and we're still stuck in the dial-up era of agents.
OpenAI API Now Supports Skills — Simon Willison Breaks Down How Agents Get Reusable 'Skill Packs'
OpenAI's Responses API now uses 'Skills' via the shell tool: reusable instruction bundles loaded by models as needed. Simon Willison found inline base64 skills in JSON requests neatest. Skills fill the 'missing middle layer' between system prompts and tools, preventing bloat.
Zhipu Open-Sources GLM-5: 744B Parameters, 1.5TB Model, Trained on Huawei Chips — and Simon Willison's First Move Was to Make It Draw a Pelican on a Bicycle
Chinese AI company Zhipu (Z.ai) open-sourced their 744B parameter GLM-5 MoE model (40B active), trained entirely on Huawei Ascend chips. Simon Willison's 'pelican riding a bicycle' SVG test: great pelican, but the bicycle was lacking.