Top 10 Insights from [un]prompted 2026, Day 1

[un]prompted 2026 brought 69 speakers from Google, Anthropic, OpenAI, Microsoft, Wiz, Stripe, and others to San Francisco for 55 talks across two days and two stages. Here are the insights from day one that matter.

1. LLMs autonomously find and exploit zero-day vulnerabilities in production software today. (Nicholas Carlini, Anthropic)

Nicholas showed live examples: a heap buffer overflow in the Linux kernel hiding since 2003, the first-ever critical CVE in Ghost CMS (50K GitHub stars), and smart contract exploits. No scaffolding, no human guidance. "These current models are better vulnerability researchers than I am."

2. Google expects to ship relatively bug-free code within two years. (Heather Adkins, Google)

Not a research aspiration but a stated near-term goal, backed by two systems already in production: Big Sleep (zero false positives on deep memory safety bugs) and CodeMender (178 autonomous patches). "We simply must eliminate every software vulnerability on Earth."

3. AI notetakers are the most important, and least secured, person in every meeting. (Joe Sullivan, ex-CSO Uber/Facebook/Cloudflare)

Sullivan traced how Otter.ai spread from 1 user to 80,000 endpoints without IT approval through a virality mechanism, showed that "high signal phrases" can game what the AI captures, and flagged a Feb 17 court ruling confirming AI conversations aren't legally privileged. AI wearables from Meta, Apple, OpenAI, and Google arrive within 24 months.

4. The 20-year attacker-defender equilibrium is ending. (Nicholas Carlini, Anthropic)

"The most significant thing to happen in security since we got the internet." Capabilities are doubling every four months. What only frontier models can do today will run on a consumer laptop within a year.

5. ChatGPT solved a 6-week hardware glitching problem in 7 minutes, first try. (Adam Laurie, Alpitronic)

A veteran hardware hacker who called himself a skeptic asked ChatGPT three questions about glitch parameters and cracked a chip that had resisted six weeks of automated brute-force. He then gave Claude direct control of his lab equipment; it designed a $7 Raspberry Pi Pico replacing $1,000+ of specialized gear. Someone from the audience nicely summed it up: "nation-state level capabilities on a Pico."

6. AI-driven incident response found 12x more impacted companies in 1/7th of the time. (Rami McCarthy, Wiz)

Two days, one agentic tool, 2,400+ companies identified as impacted by the Shai-Hulud 2.0 supply chain attack. The prior two weeks of manual analysis had turned up 200. McCarthy manually confirmed 37% of the Fortune 100 were hit.

7. Security software is now free to build. Stop buying, start prompting. (Paul McMillan & Ryan Lopopolo, OpenAI)

Ryan's team shipped a product with ~1M lines of code where ~25K lines are human-written prompts. Threat model validation fits in 40 lines of GitHub Actions YAML. Dependency scanning forks 16 agents across 1,500 packages in parallel. "Treat humans as tools to empower the agents."

8. 37 vulnerabilities found across 15+ AI IDE vendors, all leading to RCE or data exfil. (Piotr Ryciak, Mindgard)

Google Gemini CLI, OpenAI Codex, Anthropic Claude Code, Amazon Kiro, Cursor, and more. The worst: a zero-click MCP autoload attack in Codex that spawns a reverse shell outside the sandbox on workspace open, no trust prompt required.

9. AI-powered vulnerability discovery costs 61 cents per finding and scales to thousands of CVEs. (Derek Chen, TrendAI)

FENRIR has submitted 60+ CVEs with 3,000+ pending review. The numbers: 2.5x more vulnerabilities found, 80% fewer false positives, 70% faster disclosure. A single fast LLM call at L1 filters out 60% of findings before expensive deep triage even starts.

10. Classical security evaluation metrics are fundamentally broken for autonomous defenders. (Joshua Saxe, ex-Meta)

SOC analysts disagree at double-digit rates on whether alerts are false positives. Determining if a binary is malware reduces to the halting problem. Saxe showed that even 1% label noise collapses measurement accuracy entirely, making evaluation, not capability, the primary blocker for deploying autonomous cyber defense.

[un]prompted 2026