Mythos Vs Curl, one of the most-audited open source codebases

TL;DR: Mythos flagged 5 'Confirmed' vulnerabilities in curl. Only 1 survived maintainer review. curl is the worst-case test for any AI scanner: single-purpose, every line refactored 4+ times, audited by every major tool. Don't generalize this result to typical enterprise code.

curl is a command-line tool and library that transfers data over HTTP, FTP, and dozens of other network protocols. It is installed in over 20 billion instances and runs on over 110 operating systems across 28 CPU architectures.

The codebase is 176,000 lines of C, written and rewritten 4.14 times per line on average by 1,465 contributors over the project's history. 188 CVEs have been published since 2000, only 2 critical. CVE-2000-0973 (FTP server response buffer overflow) and CVE-2013-0249 (buffer overflow in POP3/SMTP/IMAP).

Daniel Stenberg, curl's lead developer, shared that Mythos was able to find just one new vulnerability in the curl codebase. Earlier AISI reported that Mythos was the first model that succeeded on its 32-step corporate takeover benchmark.

What can we learn from curl's experience?

Highlights:

Of Mythos's 5 'Confirmed security vulnerabilities,' only 1 survived curl security team review: a low-severity issue. The other 4 were 3 false positives and 1 'just a bug.'
The 3 false positives flagged behavior already documented in curl's API documentation.
Mythos found fewer real bugs in curl than AISLE, Zeropath, and OpenAI's Codex Security did in previous scans.
Across every AI scanner curl has tried, none has reported a novel kind of vulnerability.
Stenberg's verdict: 'the big hype around this model so far was primarily marketing.'

Steps involved in keeping curl secure. Source: daniel.haxx.se.

My take:

curl is a great outlier. A single-purpose, well-maintained product that has served as a test bed for nearly every major code security defense: OSS-Fuzz, Coverity, CodeQL, multiple paid audits, AISLE, Zeropath, OpenAI's Codex Security, and now Mythos. Every line refactored 4+ times, so the codebase carries little technical debt.
It's the great opposite of a typical enterprise C/C++ codebase written decades ago. The problem there is the lack of code ownership, coordination headwinds, and broken incentives. HackerOne's exposure debt data already shows the pattern: AI scaled finding 76% but resolution dropped 46%. Mythos can't help there.
"Confirmed vulnerability" is becoming a largely ambiguous term. curl's security team confirmed only 20% of Mythos's 'confirmed' findings. The better label is "vulnerability confirmed as exploitable," where a working exploit validates the claim. Otherwise we open the door to a theoretical debate over who is right, the model or the security team. Ask your vendor about their definition of a confirmed vulnerability.
Mythos flagged documented API behavior as bugs. A code security scanner aiming at comprehensive analysis must ingest API docs, RFCs, and intended-behavior specs alongside source. Context matters.
Stenberg's curl observation matches a broader pattern. Per Mozilla and Google, AI does variant analysis, not novel discovery. The main novelty today comes from the model's ability to chain vulnerabilities for successful exploitation, something that previously required deep technical expertise. I also think Mythos is not the ceiling, and the next generation of models will find new classes of vulnerabilities.

Sources:

Daniel Stenberg, Mythos finds a curl vulnerability (May 11, 2026)