Net new from Anthropic's Zero Trust for AI agents
TL;DR: Anthropic shares its vision for Zero Trust for AI agents. Friction-only controls are ineffective. A framework with three maturity levels across seven control domains provides implementation guidance to security architects and engineers.
Just eight days ago, Anthropic published an engineering retrospective on how its team contains Claude across claude.ai, Claude Code, and Claude Cowork.
Two days later, it followed up with a Zero Trust framework and implementation guidance for AI agents, aimed at security architects and engineers.
Highlights:
- Agent-specific security considerations. Execution autonomy, tool access, non-determinism in instruction interpretation, persistent memory, and multi-agent coordination.
- Agentic threats. Prompt injection, tool and resource misuse, identity and privilege abuse, supply chain compromise, and memory and context poisoning.
- The raised floor. AI-enabled offense reduces the effectiveness of friction-only controls and makes short-lived tokens, crypto-rooted identity, and automated first-pass triage foundational.
- Seven control domains. Agent identity and authentication, access control and privilege management, observability and auditing, behavioral monitoring and response, input validation and output controls, integrity and recovery, and AI governance policies.
My take:
- There seems to be no shortage of frameworks. Every major vendor has shown its thought leadership and published one, including Google's SAIF and Cisco's Integrated AI Security and Safety Framework, OpenAI's and Microsoft's agent security guidance, just to name a few.
- The importance of security hygiene has grown. Knowing your assets, patching, and least-privilege access remain relevant, but keeping up is becoming harder as the pace of software development and vulnerability exploitation accelerates. One operator breached nine Mexican government agencies in seven weeks with AI assistance.
- "Delay is now the primary risk." AI has accelerated offense, so any human review or approval in the defense loop puts you at a disadvantage. We need to reassess the risks we are used to mitigating by having a human as a control. That human is now increasing risk, not reducing it. As the paper puts it, "enable automatic updates on any component where the risk of an automated update causing an outage is acceptable."
- Capping the blast radius by constraining what an agent can reach yields more than trying to supervise what it does.