Reducing prompt injection attack success rate from 30.7% to 1.3%

A must-read if you’re deploying AI agents and rightly worried about indirect prompt injection attacks.

DRIFT — Dynamic Rule-Based Defense with Injection Isolation for securing LLM agents by Hao Li (Washington University in St. Louis) continuing the NeurIPS 2025 best papers series.

Two types of prompt injection protections:

Model-level guardrails — safety techniques that modify or tune the model itself (e.g., pre-/post-training alignment and safety-optimized checkpoints).
System-level defenses — controls added around the model (e.g., input/output filters, sandwiching, spotlighting, and policy mechanisms).

DRIFT is a system-level protection that dynamically generates policies from the user query and updates them as the agent encounters new information. It includes:

Secure Planner → builds a minimal, safe tool trajectory and parameter schema
Dynamic Validator → approves deviations using intent alignment and Read/Write/Execute privileges
Injection Isolator → scrubs malicious instructions from tool outputs before they enter memory

The result? An attack success rate (ASR) reduction from 30.7% → 1.3% on a native agent without other system-level protections.

My takeaways:

Contextual agent security is a promising path for general-purpose agents, where defining policies upfront is either infeasible or significantly reduces utility.
DRIFT’s addition of memory protection is novel and meaningfully expands protection coverage.
The biggest limitation in real-world deployments is the assumption that the user query can be trusted, at least partially, and can serve as the sole anchor for policy generation and isolation.

The AI agent security space is emerging rapidly. I’d love to learn what you’re building or using today—and what’s working (or not).

The full paper

Another great paper on contextual security from my Google colleagues