Anthropic reveals its cybersecurity domination strategy. Claude and Gemini scored equally on cybersecurity tasks, but Claude Code scaffold showed its superiority.
Wiz just launched the AI Cyber Model Arena. 257 real-world challenges across five offensive categories: zero-day discovery, CVE detection, API security, web security, and cloud security.
Highlights:
- Scaffolding matters. Gemini 3 Pro on Claude Code scores 84% on API security and drops to 38% on OpenCode.
- Performance is domain-specific. Opus 4.6 scores 84% on API security but 27% on 0-days. Gemini 3 Pro leads in cloud security (40%) but struggles with finding vulnerabilities in code (29%).
- Models improve in finding 0-days. Opus 4.6: 27.3%, Opus 4.5: 18.2%, Sonnet 4.5: 0%.
- Smaller models punch above their weight. Haiku 4.5 on Claude Code beats (72%) GPT-5.2 with every scaffold on API.
My take:
- Biggest surprise. Gemini scored low on software vulnerabilities, despite their heavy investments in AI secure coding with Code Mender.
- Expected. Frontier models heavily depend on scaffolding for cybersecurity tasks.
- Anthropic's enterprise strategy is clear. Make Claude Code the most powerful and model-agnostic scaffold for major enterprise tasks. It lifts up every model that runs on it. Gemini CLI favors the Gemini model family.