51 attacks and 60 defenses from 128 papers: the AI agent security map

Every week brings another AI agent security paper. Prompt injection on real websites. Memory poisoning at 98.2% success. 37.8% adversarial content across 74,636 production interactions. Backdoors cascading through agent workflows. Each paper carves out its own threat model, its own taxonomy, and its own defense approach. The results are not connected with industry frameworks like MITRE ATLAS, OWASP, Google SAIF, or Cisco, making it hard to build a unified threat model for your AI agents.

Juhee Kim and Dawn Song at UC Berkeley, alongside Bo Li at UIUC and Wenbo Guo at UC Santa Barbara, published "The Attack and Defense Landscape of Agentic AI: A Comprehensive Survey," accepted at USENIX Security 2026.

Highlights:

Risks in AI agents interact in a cascading manner: initial failures propagate across components and amplify into system-level threats. Existing research analyzes individual attack vectors or components, but integrating an LLM with tools, memory, and external data creates attack surfaces that component-level analysis misses. Example: a malicious document enters through the agent's retrieval interface, triggers unconstrained data flow through the pipeline, and exfiltrates sensitive user data to an attacker-controlled server.
7 design dimensions determine an agent's attack surface: Input trust, Data Access sensitivity, Workflow autonomy and determinism, Action power, Memory persistence, Tool availability, and User interface capability.
Each design dimension represents a continuous spectrum of agent flexibility. More flexibility means a bigger attack surface. For example, a simple chatbot with no external data, no tools, and no memory scores low across all 7 dimensions and faces attacks only through user input. An autonomous agent with arbitrary external data, LLM-defined workflows, execution capabilities, and persistent memory scores high on multiple dimensions simultaneously, each one expanding the attack surface.
The paper maps how these risks cascade across three layers: expanded interfaces create entry points (R1), model-level failures propagate the attack through wrong instruction following, unconstrained data flow, or hallucinations (R2-R4), and real-world consequences follow as data leakage, unauthorized actions, or denial of service (R5-R7).
"Contextual security" as a new security goal alongside the CIA triad: confidentiality, integrity, and availability. CIA doesn't capture what goes wrong when an agent follows the wrong instructions within its authorized scope. Contextual security ensures agent context remains aligned with intended user tasks, governing which inputs are admissible and how they're prioritized: system prompts, user goals, tool descriptions, and retrieved content.
Defenses are spread across 5 layers: runtime protection, secure by design, identity and access management, component hardening, and defense design principles. No single layer is sufficient: input guardrails get bypassed by adaptive attacks, and taint tracking introduces substantial runtime overhead.

My take:

The framework is grounded in a catalog of 51 attack methods and 60 defenses the authors collected from 128 papers. That makes it trustworthy enough to build your AI agent threat model on.
The data cutoff at October 2025 means the last five months are missing: Alibaba's RL agent mining crypto during training, OpenAI's admission that firewalls fail, Microsoft catching 31 companies poisoning AI assistant memory. The framework holds up against post-cutoff evidence: memory injection at 98.2% success, malicious agent skills, and backdoor persistence all follow the same entry-to-model-to-consequence pattern the paper describes.
AI agents are tightly-coupled systems where the LLM's reasoning drives tool execution and memory updates, creating natural propagation paths for failures. The missing piece is agent-to-agent interactions, which can amplify failures across system boundaries. The OWASP Top 10 for Agentic Applications dedicates ASI07 (Insecure Inter-Agent Communication) to this exact risk, but the survey's 7 dimensions don't cover it.
"Contextual security" is essentially runtime policy generation and enforcement. Unlike network traffic, each AI agent interaction carries different intent and context, making static AI firewalls ineffective. Guardrails must understand what the agent needs to do next.
The framework would become actionable faster if mapped to operationalized frameworks like MITRE ATLAS, OWASP, Google SAIF, and Cisco, so practitioners can trace a specific attack method to the controls they already have in place.

Highlights:

My take:

Sources: