Mar 20, 2026 Ilya Kabanov

7 proofs of False in Rocq, the proof checker that verifies the Airbus C compiler

On March 5, Tristan Stérin published a blog post on his personal website. No major institution or frontier lab backing. With just Claude Opus 4.6 on a consumer Max plan, he found 7 soundness bugs in Rocq's kernel in 72 hours.

Rocq (formerly Coq) is not a research tool for geeks. It is a proof checker, software that mathematically verifies other software is correct. CompCert, the C compiler Airbus uses for flight control software, was verified in Rocq. MIT's Fiat Cryptography, also verified in Rocq, generates the elliptic curve code that handles about 90% of Chrome's secure connections. A soundness bug lets the checker accept something logically impossible as valid. From False you can derive anything, so every guarantee checked by that kernel is in question.

Highlights:

Stérin ran Opus 4.6 via Claude Code with a team prompt (mathematician, computer scientist, formal verification expert, devil's advocate) and historical soundness bugs as reference material. The experiment ran for approximately 72 hours, during which Stérin cheered Claude on with "great findings Opus, please keep going."
Rocq official kernel: 7 proofs of False plus 3 additional bugs, all filed and confirmed on GitHub (issues #21682, #21683, #21685, #21690, #21691-21694, #21701, #21702). The bugs exploited independent soundness issues in the guard checker and module system, not variations of a single flaw.
Lean4 official kernel: 0 proofs of False, but 4 bugs found. One involves an unsigned-to-size_t truncation that could theoretically yield a proof of False, but would require constructing a structure with 2^32 fields.
Historical rate of soundness bugs in Rocq: approximately one per year. This experiment found 7 in 3 days.

My take:

While no major security outlet covered this, it matters more than most loud announcements. One person with a consumer Claude subscription found 7 soundness bugs in one of the most sophisticated pieces of software that exists: a mathematical proof checker that sits underneath formally verified flight control, cryptography, and defense systems. Read it again.
The cost of finding deep vulnerabilities keeps dropping. $30 per working exploit for a JavaScript engine 0-day. Seven hours and an open-source tool for 38 vulnerabilities in physical robots. Now 72 hours and a $200/month Claude subscription to find 7 soundness bugs in the proof checker behind flight control software and browser cryptography.
Expect threat actors to target foundational systems at every level. High-value targets come to mind: the Linux eBPF verifier, KVM hypervisor, Rust type checker, and UEFI firmware. All are fully or mostly open source, the same barrier as Rocq. For closed targets like baseband firmware, CPU microcode, nation-states are already stealing vendor source code to feed their 0-day pipelines.
It's time to update your threat model for current attacker economics. An eBPF verifier bypass classified as 'unlikely' is rapidly becoming 'very possible.' Inventory your foundational dependencies. Not your application SBOMs. The layers below them. What compiler builds your production binaries? What crypto library handles your TLS? What container runtime isolates your workloads? What hypervisor hosts your VMs? What firmware boots your hardware? If any of those are open source, assume adversaries are already looking for exploitable bugs.
If you maintain an open-source project that others depend on, fund security work through grants. The Linux Foundation's Alpha-Omega Project funds exactly this.

Highlights:

My take:

Sources: