Reducing prompt injection attack success rate from 30.7% to 1.3%

A must-read if you’re deploying AI agents and rightly worried about indirect prompt injection attacks.

DRIFT — Dynamic Rule-Based Defense with Injection Isolation for securing LLM agents by Hao Li (Washington University in St. Louis) continuing the NeurIPS 2025 best papers series.

Two types of prompt injection protections:

  • Model-level guardrails — safety techniques that modify or tune the model itself (e.g., pre-/post-training alignment and safety-optimized checkpoints).
  • System-level defenses — controls added around the model (e.g., input/output filters, sandwiching, spotlighting, and policy mechanisms).

DRIFT is a system-level protection that dynamically generates policies from the user query and updates them as the agent encounters new information. It includes:

  1. Secure Planner → builds a minimal, safe tool trajectory and parameter schema
  2. Dynamic Validator → approves deviations using intent alignment and Read/Write/Execute privileges
  3. Injection Isolator → scrubs malicious instructions from tool outputs before they enter memory

The result? An attack success rate (ASR) reduction from 30.7% → 1.3% on a native agent without other system-level protections.

My takeaways:

  • Contextual agent security is a promising path for general-purpose agents, where defining policies upfront is either infeasible or significantly reduces utility.
  • DRIFT’s addition of memory protection is novel and meaningfully expands protection coverage.
  • The biggest limitation in real-world deployments is the assumption that the user query can be trusted, at least partially, and can serve as the sole anchor for policy generation and isolation.

Sources:

DRIFT: Dynamic Rule Injection for Function-calling Tool-use