How Anthropic Contains Claude: Agent Safety Is Not Just Asking for More Confirmations

Anthropic explains how claude.ai, Claude Code, and Claude Cowork contain agents: model defenses miss, permission prompts create fatigue, and the hard boundary is the VM, sandbox, filesystem policy, and egress control.

Every Agent Needs a Bouncer: Brex Open-Sources CrabTrap, an LLM-Judge HTTP Proxy for Production Agents

Brex open-sources CrabTrap — an HTTP proxy that intercepts every outbound agent request. Static rules dispatch known patterns in microseconds; the long tail goes to an LLM judge. Policies are inferred from traffic, not hand-written. Three prod surprises: inferred policies beat written ones, LLM fires on <3% of requests, audit log became agent observability.

Your AI Is Too Obedient — Prompt Injection, Zoo Escapes, and Why Your Agent Needs a Bulletproof Vest

Your AI Agent is very obedient — but it might be obeying the wrong person. Prompt Injection is social engineering for AI. Tool Use Exploitation is giving a Swiss Army knife to a 5-year-old. Context Poisoning is someone secretly changing books in a library. And then there's the zoo escape.