Prompt injection is a hard problem to solve, and OpenAI reiterates it. Thomas Shadwell and Adrian Spânu from OpenAI's security team published a blog post, "Designing AI agents to resist prompt injection." They disclosed that external researchers reported a prompt injection attack against ChatGPT Deep Research, disguised as a routine HR email, that succeeded 50% of the time with all of OpenAI's defenses active.
Papers and industry research have been saying the same for the last two years. The real question is why OpenAI is saying it now, two days after acquiring Promptfoo, the open-source AI red-teaming tool used by over 200,000 developers and more than 25% of Fortune 500 companies. And five days after launching Codex Security for code vulnerability scanning.
What OpenAI's blog post tells us:
- Prompt injection attacks increasingly resemble social engineering rather than simple prompt overrides. As models get smarter, attacks respond with authority claims, urgency cues, procedural language, and fake legitimacy.
- AI firewalling, where a classifier detects a malicious input, does not work. OpenAI says fully developed attacks "are not usually caught by such systems" because detecting them becomes "the same very difficult problem as detecting a lie or misinformation."
- The authors borrow the source-sink model from traditional security engineering. An attacker needs a source (a way to inject untrusted content) and a sink (a dangerous capability like sending data to a third party). Defenses should control how sources connect to sinks.
- OpenAI's core defense principle: design the system so manipulation is constrained even if it succeeds. Like a customer service agent who can only issue refunds up to a certain amount, AI agents should have hard limits on what they can do, regardless of what they are told.
- OpenAI references a defense mechanism called Safe Url. It detects when information from a conversation would be transmitted to a third party and either asks for user confirmation or blocks it entirely.
My take:
- Codex Security launched March 6. OpenAI acquired Promptfoo on March 9. This blog post dropped March 11. Code vulnerability scanning, AI red-teaming, and now a public argument that AI firewalls are not enough. Three moves in five days are part of OpenAI's platform strategy.
- The 50% attack success rate, disclosed in Shadwell and Spânu's blog post after external researchers reported the attack, was a gift to OpenAI. It gave them a reason to claim that existing defenses are not enough and to position their platform approach as the only viable alternative.
- Calling out AI firewalls specifically as insufficient is a deep move. If the industry believes that third-party solutions like Lakera and Rebuff work, enterprises can adopt a multi-frontier lab operating model. But if on-platform security is the only way to go, OpenAI has the upper hand, the same way AWS does in cloud.
- Frontier models are becoming a commodity. After a few more releases, GPT-7 will be roughly as capable on 90% of tasks as Gemini 5 and Claude Ficus 6, and open-source models will handle 70-80% of current tasks. Platform dependency, sticky services like cybersecurity, and making multi-platform hard for enterprises are strategies we've seen play out in cloud wars between AWS and Azure. The same story repeats in AI.
- For CISOs protecting AI agent deployments: if security lives inside the platform, your vendor choice is your security architecture. That decision gets harder to reverse with every integration. Plan for multi-platform AI security now, before switching costs make it permanent.
- For AI security startups: OpenAI just explicitly told your customers that AI firewalls are a dead end, and is already implicitly moving security to a feature paid through compute rather than a standalone product. Multi-platform AI security is the opportunity.
Sources:
- Designing AI agents to resist prompt injection (OpenAI, March 2026)
- Preventing URL-Based Data Exfiltration in Language-Model Agents (Spânu, Shadwell, January 2025)
- OpenAI acquires Promptfoo, and the cybersecurity play goes way beyond AppSec
- OpenAI releases Codex Security days after Anthropic announced Claude Code Security