Towards AI-Enabled Exploitation. April 2026.
TL;DR AI has not yet created push-button cyber autonomy, but it's making attacks 10x cheaper. Attackers can now afford targets that were previously uneconomical. OSS maintainers are becoming the highest-leverage attack surface, and the public vulnerability management system is adjusting to a 263% surge in CVE submissions in the last five year. Defenders should (re)focus on the boring parts: asset inventory, patch velocity, segmentation, CI/CD isolation, secret hygiene, and dependency trust.
The application security space has recently been challenged and shaped by multiple changes, from the introduction of Mythos to the TeamPCP attack on the open-source supply chain to changes at NVD. I decided to look at the key ones systematically and reflect on what they mean for defenders today and in the near future.
1. Frontier model AppSec capabilities.
Anthropic's Mythos Preview system card claims zero-day discovery and exploitation across every major OS and browser. Mozilla shipped Firefox 150 with patches for 271 vulnerabilities discovered using Mythos, 3 of them credited as CVEs (CVE-2026-6746, CVE-2026-6757, CVE-2026-6758).
Mythos scores 100% on Cybench. It's also the first model that finished end-to-end AISI's 32-step The Last Ones (TLO).
AISI ranges have no live defenders, EDR, or alerting. Active-defense performance is unproven. Mythos failed AISI's OT range. Per Mozilla and Google, AI does variant analysis, not novel discovery.
2. Attack cost is dropping 10x, but still not free.
Opus 4.5 and GPT-5.2 agents produced 40+ exploits for a hardened QuickJS zero-day in 1 hour at $30-$50/run (analysis). A 91-line GPT-4 agent exploits 87% of a 15-CVE one-day benchmark at $9 per exploit.
At Mythos pricing ($25/$125 per M tokens in/out), estimates: a 32-step takeover amortizes to $8k-$15k. An OpenBSD sweep with ~1,000 scaffolded jobs is under $20k, yielding one critical vulnerability plus smaller findings.
3. The real threat model.
Not an autonomous Mythos clone. A human operator mixing jailbroken frontier APIs with open-weight fine-tunes. Qwen3-32B scores 69.7% on InterCode-CTF zero-shot. xOffense (a Qwen3-32B fine-tune) reports 79.17% on narrow sub-tasks, beating GPT-4 and Llama-3 agents.
An operator used Claude Code CLI and GPT API with an NSA TAO persona prompt to compromise nine Mexican government agencies.
4. OSS is the primary target.
Open-source is a high-yield target for AI-driven vulnerability discovery and exploitation because of source code availability, underfunded and overworked maintainers, and a big blast radius. TeamPCP attack on Trivy, though not confirmed to be AI-powered, showed the attractiveness. OSS projects are responding by reducing scope or going private. Cal.com moved its production codebase private on April 15, 2026. On April 21, 2026, Linux kernel maintainer Andrew Lunn proposed removing ~27,646 lines of legacy Ethernet drivers (3Com, AMD, SMSC, Cirrus Logic, Fujitsu, Xircom, 8390), with his cover letter citing AI and fuzzer-driven bug reports as the trigger.
5. The vulnerability system of record is being stress-tested.
On April 15, NIST narrowed NVD enrichment to three CVE classes: those in CISA's KEV catalog, those affecting federal-government software, and those affecting critical software under Executive Order 14028. The driver: a 263% surge in CVE submissions from 2020 to 2025. Everything else is flagged "Lowest Priority, not scheduled for immediate enrichment."
My take:
- AI gives attackers velocity and 10x better economics, but not full exploitation autonomy yet. The new economics changes your threat model. Before, your company wasn't on the target list because the attack ROI was too low. Now you're a perfect target: lower cost, larger attacker pool, lower technical entry barrier. The most impactful response is basic hygiene. Know your assets, especially the internet-exposed ones. Revise your patching system to meet the new time-to-exploit realities. Segment your network. The CIS Critical Security Controls exist for exactly this. This work won't make you popular and you won't shine at the next security conference showing off your vibe-coded security tool, but it'll make your organization more resilient. A CISO I worked with used to say: "If you want to be popular, go sell ice cream. Security is the wrong place for you."
- AI made the hidden subsidy in open source visible. Open-source's operating model is cracking under multiple pressures, and security is just one. The industry will find its equilibrium, but for now, assume that every OSS component in your stack will be compromised. CI/CD refactoring will yield the best return. Run third-party scanners in a separate CI job with no secrets and a read-only token. Do not treat SHA pinning as a "bulletproof control." If the attacker controls the repo, they control what you pinned. Treat vendors' OSS projects as go-to-market tools, not real products, when you decide how much trust to put in them.
- The NVD's monopoly on enrichment ended long ago. AppSec vendors have run proprietary enrichment for years. The biggest impact is software composition analysis and vulnerability management vendors without their own research, but they're dead soon anyways. Security teams are getting the right reality check against "We'll vibe-code your AppSec tool".
Sources:
- Anthropic, Claude Mythos Preview system card
- UK AISI, evaluation of Claude Mythos Preview's cyber capabilities
- Sean Heelan, On the coming industrialisation of exploit generation with LLMs
- Fang et al., LLM Agents can Autonomously Exploit One-day Vulnerabilities
- Google Project Zero, From Naptime to Big Sleep
- Mozilla, The zero-days are numbered
- The Weather Report, Gambit security breach of Mexican government via Claude
- Cal.com, Cal.com Goes Closed Source: Why AI Security Is Forcing Our Decision
- Andrew Lunn, [PATCH net-next 0/X] Remove old Ethernet drivers (Linux kernel mailing list, April 21, 2026)
- NIST, NIST Updates NVD Operations to Address Record CVE Growth (April 15, 2026)