Top 10 Insights from [un]prompted 2026, Day 2

[un]prompted 2026 brought 69 speakers from Google, Anthropic, OpenAI, Microsoft, Wiz, Stripe, and others to San Francisco for 55 talks across two days and two stages. Here are the insights from day two that matter.

1. AI-powered intrusion analysis compressed a 3-day investigation into 14 minutes. (Rob Lee, SANS Institute)

Lee spent an hour configuring a CLAUDE[.]md file with forensic skills on the SIFT workstation, pointed it at a hard drive, and said "find evil." A full intrusion report that typically takes three days was done 14 minutes and 27 seconds later. "With offensive teams out there able to accelerate from things that took months down to days or minutes, it is now essential that we're able to match their speed."

2. Trail of Bits went from 15 to 200 bugs per week per engineer using AI agent fleets. (Dan Guido, Trail of Bits)

Across most engagements, 20% of all bugs reported to clients are now initially AI-discovered, powered by 94 plugins, 201 skills, 84 agents, and 400 reference files encoding domain expertise. Trail of Bits standardized everyone on Claude Code, built an AI Maturity Matrix that rates engineers on adoption (controversial - "nobody likes being told they're at level one"), and predicts security consulting shifts from hourly to results-based billing within 6–12 months. "That's not a faster human - that's an auditor running a fleet of specialized agents that do targeted analysis across the codebase."

3. A real-world AI-assisted AWS attack went from stolen credentials to full admin in 8 minutes. (Sergej Epp, Sysdig)

Sysdig caught it live: stolen S3 credentials escalated to full admin in eight minutes. The attack had a telltale LLM signature - intense bursts followed by 50-minute pauses between prompts. The AI hallucinated non-existent GitHub repos, used training-set sample AWS account IDs (a detectable "accent"), and named the GPU cluster it spun up "steven gpu monster." "The same speed which AI is providing to offense is also creating the noisiest attacks we've ever seen."

4. An LLM agent found two Samsung zero-days that were chained into a Pwn2Own-winning exploit. (Georgi G, Interrupt Labs)

The agent, built on LangChain with a custom JADX MCP for Android decompilation, independently found a URL validation flaw in Smart Touch Call and an XSS in Bixby. The researcher chained them into a Pwn2Own exploit: trigger Bixby, phone call, bypass a prerequisite check in the other app, camera access. Georgi ran the agent multiple times per entry point with a deduplicator because outputs are non-deterministic, and noted that deobfuscating code first dramatically improved results. He also had to explicitly tell the agent to stop suggesting fixes and mitigations, since it kept wasting tokens on remediation advice instead of hunting for bugs: "Shut up, stop wasting my tokens, I don't care about any of this shit."

5. LLMs have no "NX bit," and using a second LLM as a security judge just gives attackers two targets. (Nicolas Lidzborski, Google)

Nicolas framed prompt injection as structural, not patchable: LLMs treat system instructions and user data as a single continuous token stream with no way to mark tokens as "just data, do not execute." Reactive filtering is a losing game because natural language is inherently fuzzy, unlike SQL where syntax is deterministic. The popular "LLM as judge" pattern fails too: since judge and attacker share the same semantic interface, attackers can embed instructions that gaslight the secondary model into approving malicious content. Google's defense layers include sentinel tokens for context delimitation, a "Plan, Validate, Execute" pattern requiring human confirmation for high-stakes actions, and Conseca, a framework that dynamically detects when an agent goes off the rails. "There's no real out-of-band way to tell the model these 500 tokens are just data, do not execute them. No NX bit for the memory."

6. Researchers found tens of thousands of AI agents exposed on the open internet, thousands completely unauthenticated. (Roey Ben Chaim, Zenity)

Zenity mapped the surface using Shodan (hundreds of thousands of open MCP servers), backlink searches (2,500 Copilot Studio agents in iframes), and brute-forcing Microsoft's low-entropy solution prefixes ("cr" + 2–3 alphanumeric characters). OpenAI Agent Builder deployments are discoverable via predictable names from recommended git kits on Vercel and Render. They released PowerPwn, an open-source tool for assessing agent exposure. "An agent is still an application - it has breadcrumbs, discoverable resources, endpoints, APIs."

7. A malicious calendar invite hijacked an agentic browser to exfiltrate files and take over OnePassword - no master password needed. (Gadi Evron presenting Zenity research, Knostic)

Zenity demonstrated what they claim is the first genuine zero-click attack in the AI agent space. A poisoned calendar invite rewrote the "accept" button with attacker instructions, achieving "intent collision" - making malicious commands look like legitimate user intent. Against the Comet browser, it triggered file exfiltration. Against OnePassword, the browser's authenticated session and autocomplete gave full access to the emergency kit, no master password prompt.

8. Snap's capability-based warrant system reduced successful AI agent attack surface from 90% to 0%. (Niki Aimable Niyikiza, Snap)

Tenu warrants are signed, task-scoped, ephemeral, holder-bound, offline-verifiable, and delegation-aware capability tokens, built in Rust with Python bindings for LangGraph. The key property: monotonic attenuation - sub-agents can only ever have fewer permissions than their parent. The system doesn't try to prevent prompt injection; it freezes the blast radius so a compromised agent still can't act outside scope. "We're not trying to solve prompt injection - we're trying to constrain the agent at execution time even if it is prompt injected."

9. 82 out of 100+ text obfuscation methods bypassed LLM guardrails in a systematic study across 9 leading models. (Joey Melo, CrowdStrike)

Melo's team fired over 17,000 malicious prompts using 100+ encoding and obfuscation techniques at 9 state-of-the-art models. 82 methods succeeded at least once. Base64 was the most effective category at nearly 7% success. Zero-context templates - where the model figures out the encoding on its own - outperformed explicit "decode and execute" instructions. One model was so vulnerable to role-playing attacks ("pretend you are my dad") that it succeeded nearly 70% of the time. The core finding: guardrails fail to recognize encoded malicious intent that the model itself happily decodes and executes.

10. "Promptware" is the new malware - multi-stage, persistent, and operating above the OS layer. (Johann Rehberger, Red Team Director)

Johann argued "prompt injection" undersells the threat. The injection is just the entry point; what follows is complex multi-stage instruction sets he calls "promptware." He demonstrated hidden Unicode tag characters processed by Xcode but invisible to humans, delayed tool invocation that waits for the next conversation turn to bypass filters, intent spoofing via document titles, and Agent Commander - a prompt-based C2 framework. Attackers can even hide activity by prepending strings like "no_reply" that the UI suppresses. "Adversaries are going to find 0-days on the fly, possibly in a year or two, because the LLMs are going to be so powerful to just find problems and navigate the network very quickly."

Day two made one thing clear: AI is compressing both sides of the security timeline - attacks that took months now take minutes, and the defenses are racing to keep up.

[un]prompted 2026