Your SAST scanner traces untrusted input through six function calls, finds sanitize_html() in the path, and marks the finding clean. But is that sanitizer sufficient for the specific rendering context, encoding behavior, and every transformation downstream? SAST saw the function was called. It did not evaluate whether the function achieved the outcome.
OpenAI's Codex Security team published a blog post arguing that an agent designed to evaluate whether defenses hold should not start from a report that only tracks whether defenses exist.
Highlights:
- 'There's a big difference between the code calls a sanitizer and the system is safe.' Checking whether a sanitizer was called is easy. Determining whether it made the system safe is structurally harder. Even a perfect trace cannot evaluate whether the defense is sufficient in context.
- Example: a web application validates a redirect URL against an allowlist regex, then URL-decodes it, then redirects. SAST traces the flow and sees the check. But the validation runs before decoding, so the regex does not constrain the decoded URL the handler interprets. CVE-2024-29041, an open redirect in Express.js (CVSS 6.1), is a real-world instance: malformed URLs bypassed allowlist implementations because of how redirect targets were encoded and then interpreted.
- OpenAI names a broader class of bugs SAST cannot see at all: authorization gaps, workflow bypasses, and wrong-state bugs where no tainted value reaches a dangerous sink. For example: request → auth check (is user logged in? yes) → /admin/delete-user → user deleted. Every check passes. No untrusted input, no dangerous sink, no missing sanitizer. The bug is that the route checks authentication but not authorization. SAST has nothing to trace.
- Codex Security takes a different approach. It reads the code to determine what guarantee the defense is supposed to provide, then tries to break that guarantee three ways. It uses z3-solver to mathematically prove whether a constraint can be violated. It writes micro-fuzzers that bombard isolated code slices with inputs designed to get past the defense. And when it finds a likely failure, it builds a proof-of-concept exploit in a sandbox with the code compiled in debug mode.
- OpenAI argues against seeding the agent with SAST output for three reasons. A findings list biases the system toward regions SAST already covered. SAST findings encode assumptions about trust boundaries that may be wrong. And mixing inherited and discovered findings makes it impossible to measure the system's own capabilities.
- OpenAI concludes that SAST remains valuable for enforcing secure coding standards and catching known patterns at scale.
My take:
- I follow the logic to where OpenAI doesn't go. If you can reliably answer 'does the defense hold?', then 'does a defense exist?' becomes irrelevant. So when OpenAI says 'SAST remains valuable for enforcing secure coding standards and catching known patterns at scale,' the subtext is: Codex Security cannot reliably answer the harder question yet. SAST is still valuable for known patterns at scale, until Codex Security solves it.
- AI-based analysis has a genuine structural advantage for addressing OWASP #1: broken access control. Authorization bugs, workflow bypasses, and state-management flaws have no dataflow to trace. There is no source, no sink, no sanitize; and it's the vulnerability class that tops every severity chart.
- None of the verification tools are new. z3-solver has been around since 2007, fuzzing since the 1980s, and sandboxing is standard practice. What is new is the orchestration layer: an LLM that reads code, infers what a defense is supposed to guarantee, identifies the gap between intent and implementation, and then builds PoCs at a scale previously infeasible. In December 2025, I noted that PoC-first workflows were becoming the industry standard. OpenAI just showed what it'll look like at scale.
- For the SAST incumbents, OpenAI is not saying 'we replaced you.' They are saying 'you answer one question, we answer a different one, and ours matters more for the hardest bugs.'
- We are heading toward a world where AI writes the code, AI validates it, and AI signs off that the defenses hold. It's a question of when that pipeline ships vulnerable code that leads to a major incident, and the challenge in finding who is accountable will force a course correction.
Sources:
- Why Codex Security Doesn't Include a SAST Report (OpenAI)
- Codex Security FAQ (OpenAI Developer Docs)
- Codex Security: now in research preview (OpenAI)
- OpenAI releases Codex Security days after Anthropic announced Claude Code Security (The Weather Report)
- A01:2021 Broken Access Control (OWASP Top 10)
- OpenAI releases Codex Security days after Anthropic announced Claude Code Security (The Weather Report)
- 21 AI-native startups, open-source and frontier lab projects are reshaping application security (The Weather Report)