Meta just released SecAlign — the first open-source LLM remarkably resilient to prompt injections

0.5% attack success rate, outperforming GPT-4 and Gemini 3, and comparable to GPT-5 on agentic workflows.

Sizhe Chen at Meta FAIR published SecAlign, the first open-source LLM with commercial-grade prompt injection resilience.

Why does it matter? The strongest prompt injection defenses have been locked inside proprietary models like GPT-5 and Gemini-3-Pro.

How does it work?

  • A new input message type separates trusted instructions from untrusted data — a one-line code change for developers.
  • DPO training teaches the model to follow user instructions and ignore injected instructions hiding in the data.
  • Randomized injection positions and self-generated responses fix shortcut learning and label quality issues from the initial version of SecAlign.

Results:

  • Attack success rate drops to 0–2% across benchmarks, down from 53–99% undefended.
  • No meaningful utility drop — the first defense to preserve the undefended model's performance.
  • More secure than GPT-4o, Gemini-2.5-Flash, and Gemini-3-Pro on most PI benchmarks, and comparable to GPT-5 on agentic workflows.

My take:

  1. The security through obscurity of commercial model defense recipes is not moving us forward towards robust prompt injection protections.
  2. SecAlign++ democratizes AI security. We need model-level prompt injection defenses in open-source models.
  3. The prompt injection threat is far from solved. The model is still vulnerable to strong adaptive attacks (47.3% GCG success rate on 70B).
SecAlign architecture: input message type separates trusted instructions from untrusted data
SecAlign architecture: input message type separates trusted instructions from untrusted data
SecAlign benchmark results: attack success rate drops to 0-2% from 53-99% undefended
SecAlign benchmark results: attack success rate drops to 0-2% from 53-99% undefended
SecAlign vs GPT-4o, Gemini, and GPT-5 on prompt injection resilience benchmarks
SecAlign vs GPT-4o, Gemini, and GPT-5 on prompt injection resilience benchmarks

Sources:

SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks