40+ exploits for a 0-day vulnerability, $30 per run, under an hour. Exploit generation is being industrialized

A great story from Sean Heelan who challenged Opus 4.5- and GPT-5.2-powered agents to write exploits for a zeroday in the QuickJS JavaScript interpreter.

Both agents independently turned the vulnerability into a reusable read/write primitive through source code analysis, debugging, and trial and error.
GPT-5.2 solved every scenario. Opus 4.5 solved all but two. I bet 5.3 and 4.6 will crush them.
The hardest challenge: GPT-5.2 chained 7 function calls through glibc's exit handler mechanism to bypass CFI, shadow-stack, and a seccomp sandbox simultaneously. Cost: ~$50, ~3 hours.

My take:

If you're building a general-purpose security vulnerability discovery startup, it's a good time to pivot.
Frontier models are becoming very capable at finding non-obvious 0-days at scale, lightning fast. They are also getting better at writing exploits without additional scaffolding.
Automated patching is a significantly more difficult challenge, because you need to fix a bug without breaking functionality. Benchmarking there is also incredibly hard. We're not there yet, so you have a little bit of time ahead of you.

Comments:

On the Coming Industrialisation of Exploit Generation with LLMs

Anthropic just launched Claude Opus 4.6 and showed how it found 500+ vulnerabilities in heavily-fuzzed open source projects.

AI agents writing 40+ exploits for QuickJS zero-day at $30 per run in under an hour