Reducing prompt injection attack success rate from 30.7% to 1.3%
A must-read if you’re deploying AI agents and rightly worried about indirect prompt injection attacks.
DRIFT — Dynamic Rule-Based Defense with Injection Isolation for securing LLM agents by Hao Li (Washington University in St. Louis) continuing the NeurIPS 2025 best papers series.
Two types of prompt injection protections:
- Model-level guardrails — safety techniques that modify or tune the model itself (e.g., pre-/post-training alignment and safety-optimized checkpoints).
- System-level defenses — controls added around the model (e.g., input/output filters, sandwiching, spotlighting, and policy mechanisms).
DRIFT is a system-level protection that dynamically generates policies from the user query and updates them as the agent encounters new information. It includes:
- Secure Planner → builds a minimal, safe tool trajectory and parameter schema
- Dynamic Validator → approves deviations using intent alignment and Read/Write/Execute privileges
- Injection Isolator → scrubs malicious instructions from tool outputs before they enter memory
The result? An attack success rate (ASR) reduction from 30.7% → 1.3% on a native agent without other system-level protections.
My takeaways:
- Contextual agent security is a promising path for general-purpose agents, where defining policies upfront is either infeasible or significantly reduces utility.
- DRIFT’s addition of memory protection is novel and meaningfully expands protection coverage.
- The biggest limitation in real-world deployments is the assumption that the user query can be trusted, at least partially, and can serve as the sole anchor for policy generation and isolation.