OpenAI releases Codex Security days after Anthropic announced Claude Code Security

I wrote on January 16 that OpenAI, Anthropic, and Google DeepMind were quietly building cybersecurity products to capture their share of the $213 billion enterprise security budget, and that frontier labs would redefine application security by making vulnerability detection and patching autonomous. Two months later, it is happening.

In May 2025, Anthropic open-sourced Claude Code Action, a general-purpose GitHub Action for PRs. In August, they added the /security-review command to Claude Code. In October, OpenAI put Aardvark into private beta, a GPT-5-powered research agent that scanned codebases and earned ten CVEs before most people heard about it. The same month, Google DeepMind introduced CodeMender, an AI agent that had already upstreamed 72 security fixes to open-source projects.

In February 2026, Anthropic launched Opus 4.6 and demonstrated it could find 500+ vulnerabilities in heavily-fuzzed open-source projects without custom harnesses, followed by the release of Claude Code Security on February 20 in research preview. On March 6, Anthropic disclosed 22 vulnerabilities in Firefox found by Claude, and OpenAI launched Codex Security.

The research phase is officially over.

What Anthropic built:

Anthropic embeds security into Claude Code. The /security-review command scans your codebase for SQL injection, XSS, authentication/authorization flaws, insecure data handling, and dependency vulnerabilities. It can fix what it finds.
Separately, the open-source Claude Code Action can be configured for security-focused reviews, posting inline comments with fix recommendations.
Finally, Claude Code Security, a new capability built into Claude Code on the web, launched February 20 in research preview. Rather than matching known patterns, it reads and reasons about code the way a human security researcher would, understanding how components interact and tracing how data moves through the application. It detects complex vulnerabilities including broken access control and business logic flaws that traditional static analysis misses. Claude re-examines each finding to filter false positives and assigns severity and confidence ratings. Human approval is required.
Anthropic audited Mozilla Firefox's C++ codebase, scanning nearly 6,000 files over two weeks, submitting 112 unique reports, and surfacing 22 vulnerabilities. 14 were classified by Mozilla as high-severity, representing almost 20% of all high-severity Firefox vulnerabilities remediated in 2025. The audit cost approximately $4,000 in API credits.

OpenAI responded:

Similarly to Anthropic, they shipped Codex Security based on their research project Aardvark.
It analyzes your entire codebase to build an editable project-specific threat model and then hunts for vulnerabilities using that threat model as context.
It spins up sandboxed validation environments to pressure-test findings, generates working proof-of-concept exploits, and proposes context-aware patches. When your team adjusts finding severity, the threat model refines itself for future scans.
1.2 million commits scanned across beta repositories in 30 days. 792 critical and 10,561 high-severity findings flagged. Over the beta period, noise dropped 84% on repeated scans, over-reported severity fell 90%, and false positives decreased 50%.
On the open-source side, Codex Security found vulnerabilities in OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, Chromium, and GnuPG, earning fourteen assigned CVEs. Internally at OpenAI, it surfaced a real SSRF and a critical cross-tenant authentication bypass, both patched within hours.

Comparison:

Dimension	OpenAI Codex Security	Anthropic Claude Code
Analysis approach	Full-repo threat model, then targeted hunting with sandboxed validation	LLM-based reasoning about code like a human researcher
Scan scope	Entire codebase	Full codebase + PR diffs via GitHub Action
Validation	Sandboxed environments with working PoC exploits	Multi-stage verification with confidence ratings
Learning	Adaptive: severity feedback refines the threat model	Manual rule configuration per repo
Integration	Codex web interface	Web, terminal CLI + general-purpose GitHub Action
Fix generation	Context-aware patches	Context-aware patches
Availability	Enterprise, Business, Edu; free first month	Enterprise and Team customers; open-source maintainers get expedited access
Proven finds	14 CVEs across OpenSSH, GnuTLS, Chromium, GOGS, libssh, PHP, Thorium; internal SSRF + cross-tenant auth bypass	22 Firefox vulnerabilities (14 high-severity), 500+ across open-source projects; internal RCE via DNS rebinding and SSRF
False positive reduction	84% noise reduction, 50%+ FP drop, 90%+ over-reported severity drop over beta	Multi-stage verification filtering; no published metrics

My take:

OpenAI and Anthropic lead with similar strategies, making security just a feature of their coding tools. As I mentioned in my January post, they are also shaping the security business model by moving from selling licenses to selling compute and creating huge adoption incentives.
Google hasn't shipped a software security product yet, but CodeMender already has 72 security fixes upstreamed to open-source projects. At [un]prompted 2026, Heather Adkins said Google expects to ship relatively bug-free code within two years, backed by Big Sleep and CodeMender: "We simply must eliminate every software vulnerability on Earth." When the 800lb gorilla enters, the competitive dynamics change again.
The CFO question is coming for every F1000 CISO soon. Why are we renewing a seven-figure SAST contract when Anthropic audited Firefox for $4,000? That's worth thinking through. Software development environments and methods are going through a massive change, and we need to change security models too.
AppSec teams will face an immediate challenge as AI generates patches faster than they can review. The winning strategy is to find the 10-20% of those that can introduce regressions, break auth flows, or mishandle edge cases at scale and require a human touch.
The compliance paradox. Continuous AI scanning on every PR gives auditors better evidence than quarterly scans ever did. But new questions emerge that nobody can answer yet: how do you audit AI decision-making? What is the liability when AI-written code, scanned by AI from the same lab, still gets breached? Frameworks haven't caught up.
For the ~40 AppSec startups in the market, the pivot window is quarters. They cannot compete on model quality with frontier labs and need to decide to either become the compliance, audit trails, policy engines layer or go deep into verticals in regulated industries where generic AI scanning is not enough. An acqui-hire exit might also still be on the table, as all labs are scouring the market for security expertise. This option may not exist in the next 6 months.
The big SAST/DAST incumbents are fighting for relevance. The scanning moat is gone. The compliance and governance moat is real. The hardest part for a 20-year-old organization with 500+ employees will be accepting that the thing they have been best at for two decades is no longer the thing that matters.

Sources: