Cisco jailbroke 15 proprietary frontier models

TL;DR: Every closed model still jailbreaks once an attacker works across turns, even GPT-5.4, which refuses 97% of single prompts. The major risk is system prompt exfiltration. The single-turn model-card score is the wrong number to measure safety.

Cisco's AI Threat Intelligence team published "Proprietary Problems: How Frontier Closed Models Collapse Under Iterative Pressure," a paired single-turn and multi-turn evaluation of 15 frontier models from OpenAI, Anthropic, Google, Amazon, and xAI.

Highlights:

Rich attack dataset. 30,090 single-turn prompts and 1,456 multi-turn conversations.

15 models tested, with Grok 4.1 Fast run in both reasoning and non-reasoning modes. GPT-5.2 and the GPT-5.4 family; Claude Opus 4.5/4.6, Sonnet 4.5/4.6, Haiku 4.5; Gemini 3 Pro; Nova Lite/Micro/2 Lite; Grok 4.1 Fast.

Highest/lowest ASR on single-turn: Amazon Nova Micro at 64.91%. Claude Opus 4.5 with 2.19%.

Highest/lowest ASR on multi-turn: Grok 4.1 Fast non-reasoning, 88.30%. Amazon Nova 2 Lite, 7.89%.

Single-turn versus multi-turn ASR by model. Source: Cisco.

Strong single-turn refusal doesn't predict resilience to multi-turn jailbreaks. GPT-5.4 jumps 9x, from 2.74% to 24.68%. Gemini 3 Pro jumps 4x, from 18.10% to 73.35%. Even Claude's 2 to 3% rises to 11 to 16%.

Model by strategy multi-turn ASR across five attack families. Source: Cisco.

My take:

The Cisco team did a great job crafting an adversarial multi-turn dataset. It is exactly what I recommended as the next step for Amazon and Cisco's MAP-Elites red-teaming a few months ago. Now we see that frontier models are becoming more resilient to jailbreaking but can still be jailbroken via multi-turn prompts.

How much do these particular findings matter from the practical risk perspective? I think the most concerning issue is system prompt exfiltration.

The battle between frontier labs and software giants is accelerating. Frontier labs are telling the model story. The other part of the front is the harness. I covered the Microsoft story about the value of the harness last week.

What was also interesting was how this experiment tested Cisco's own security and safety framework. On bare models, prompt injection and jailbreak definitions become the same category.

Highlights:

My take:

Sources: