Microsoft brings the Azure playbook to AI AppSec with MDASH

TL;DR MDASH topped CyberGym at 88.45%, five points ahead of Anthropic's Claude Mythos Preview, with 16 new Windows CVEs. Microsoft repeats one thesis sixteen times across the post: the model is a commodity, the harness is the moat.

Frontier labs are aggressively trying to claim dominance over the AI AppSec market, creating FOMO among other vendors.

Microsoft is figuring out its play. They already own two of the most critical pieces of software development infrastructure, GitHub and VSCode, so they have every right to expect a share of the market.

On May 12, Microsoft announced MDASH, the Multi-model Agentic Scanning Harness. It topped the CyberGym leaderboard at 88.45%, roughly five points ahead of Anthropic's Claude Mythos Preview at 83.1%.

CyberGym leaderboard. Source: Microsoft Security Blog.
CyberGym leaderboard. Source: Microsoft Security Blog.
CyberGym success rate over time. Source: Microsoft Security Blog.
CyberGym success rate over time. Source: Microsoft Security Blog.

Highlights:

  • 88.45% on CyberGym across 1,507 tasks. Claude Mythos Preview is second at 83.1%. GPT-5.5 is third at 81.8%.
  • 16 new CVEs in the Windows networking and authentication stack. CVE-2026-33827 is a remote unauthenticated use-after-free in tcpip.sys via SSRR packets. CVE-2026-33824 is an unauthenticated double-free in IKEv2 that affects RRAS VPN and DirectAccess.
  • Retrospective recall: 96% on five years of MSRC cases in clfs.sys, 100% in tcpip.sys. 21 of 21 planted vulns found on a test driver with zero false positives.
  • Five-stage pipeline: Prepare, Scan, Validate, Dedup, Prove. Auditor agents find candidates, debater agents argue reachability, the system then builds an actual triggering input to prove the bug.
MDASH five-stage pipeline. Source: Microsoft Security Blog.
MDASH five-stage pipeline. Source: Microsoft Security Blog.

My take:

  1. Microsoft is capitalizing on a talent grab. Team Atlanta, the Georgia Tech crew led by Taesoo Kim, won 1st place in DARPA's AI Cyber Challenge last year with a $6M prize. Taesoo and at least two other members joined Microsoft this year.
  2. Microsoft is replaying the Azure playbook: downplay the incumbent, push multi-vendor, capture the platform layer. MDASH runs the same move on frontier labs. The blog post hypnotizes the reader by repeating the 'model is a commodity' thesis sixteen times.
  3. Marketing gets tricky when don't have an product to show and don't own a foundation model. It leads to a 'we got better results than Mythos by using GPT-5.5, but we can't say it directly' message. Same lab-marketing dynamics I covered in the OpenAI Daybreak write-up.
  4. Nothing is said about cost. I guess running it is not cheap. MDASH uses SOTA models as the heavy reasoner and as an independent counterpoint, along with distilled models as a cost-effective debater for high-volume passes.
  5. The most real phrase in the blog post is 'Every finding has a real owner.' The lack of code ownership and coordination headwinds can diminish the value of AI-powered vulnerability discovery, as I covered in Rising Exposure Debt.

Sources:

Microsoft Security Blog, Defense at AI speed: Microsoft's new multi-model agentic security system tops leading industry benchmark (May 12, 2026)