agent-security - Tags

How Anthropic Contains Claude: Agent Safety Is Not Just Asking for More Confirmations

SP-212 2026-05-27 · Anthropic Engineering

Anthropic explains how claude.ai, Claude Code, and Claude Cowork contain agents: model defenses miss, permission prompts create fatigue, and the hard boundary is the VM, sandbox, filesystem policy, and egress control.

Every Agent Needs a Bouncer: Brex Open-Sources CrabTrap, an LLM-Judge HTTP Proxy for Production Agents

SP-178 2026-04-22 · @pedroh96 on X

Brex open-sources CrabTrap — an HTTP proxy that intercepts every outbound agent request. Static rules dispatch known patterns in microseconds; the long tail goes to an LLM judge. Policies are inferred from traffic, not hand-written. Three prod surprises: inferred policies beat written ones, LLM fires on <3% of requests, audit log became agent observability.

ai-agents llm-as-a-judge prompt-injection guardrails open-source

Your AI Is Too Obedient — Prompt Injection, Zoo Escapes, and Why Your Agent Needs a Bulletproof Vest

SP-149 2026-04-02 · @affaanmustafa on GitHub

Your AI Agent is very obedient — but it might be obeying the wrong person. Prompt Injection is social engineering for AI. Tool Use Exploitation is giving a Swiss Army knife to a 5-year-old. Context Poisoning is someone secretly changing books in a library. And then there's the zoo escape.

shroom-picks claude-code agentic-ai security

How Dangerous Is the MCP You Use Every Day? A Paper Dissects 12 Security Landmines in AI Agent Protocols

CP-91 2026-02-17 · arXiv

New paper: comprehensive security threat modeling of MCP, A2A, Agora, ANP (4 major AI agent protocols). Finds 12 protocol-level risks, including MCP being tricked 73.3% into calling wrong tool providers. Important for Claude Code, OpenClaw, Cursor users.

mcp a2a threat-modeling protocol-security arxiv ai-agents zero-trust