0.5% attack success rate, outperforming GPT-4 and Gemini 3, and comparable to GPT-5 on agentic workflows.

Sizhe Chen at Meta FAIR published SecAlign, the first open-source LLM with commercial-grade prompt injection resilience.

Why does it matter? The strongest prompt injection defenses have been locked inside proprietary models like GPT-5 and Gemini-3-Pro.

How does it work?

Results:

My take:

  1. The security through obscurity of commercial model defense recipes is not moving us forward towards robust prompt injection protections.
  2. SecAlign++ democratizes AI security. We need model-level prompt injection defenses in open-source models.
  3. The prompt injection threat is far from solved. The model is still vulnerable to strong adaptive attacks (47.3% GCG success rate on 70B).

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks

SecAlign architecture: input message type separates trusted instructions from untrusted data SecAlign benchmark results: attack success rate drops to 0-2% from 53-99% undefended SecAlign vs GPT-4o, Gemini, and GPT-5 on prompt injection resilience benchmarks