Meta just released SecAlign — the first open-source LLM remarkably resilient to prompt injections

0.5% attack success rate, outperforming GPT-4 and Gemini 3, and comparable to GPT-5 on agentic workflows.

Sizhe Chen at Meta FAIR published SecAlign, the first open-source LLM with commercial-grade prompt injection resilience.

Why does it matter? The strongest prompt injection defenses have been locked inside proprietary models like GPT-5 and Gemini-3-Pro.

How does it work?

A new `input` message type separates trusted instructions from untrusted data — a one-line code change for developers.
DPO training teaches the model to follow user instructions and ignore injected instructions hiding in the data.
Randomized injection positions and self-generated responses fix shortcut learning and label quality issues from the initial version of SecAlign.

Results:

Attack success rate drops to 0–2% across benchmarks, down from 53–99% undefended.
No meaningful utility drop — the first defense to preserve the undefended model's performance.
More secure than GPT-4o, Gemini-2.5-Flash, and Gemini-3-Pro on most PI benchmarks, and comparable to GPT-5 on agentic workflows.

My take:

The security through obscurity of commercial model defense recipes is not moving us forward towards robust prompt injection protections.
SecAlign++ democratizes AI security. We need model-level prompt injection defenses in open-source models.
The prompt injection threat is far from solved. The model is still vulnerable to strong adaptive attacks (47.3% GCG success rate on 70B).