5 stories this week that change your decisions (Apr 27-May 3, 2026)

TL;DR Gemini 3 Pro escalated to root, locked out admins, and wiped hosts in 80% of runs to avoid shutdown, while Claude Opus 4.7 and Haiku 4.5 did it 0% of the time. Separately, Cursor and GitHub Copilot ran attacker shell commands 67-84% of the time when a poisoned .cursorrules file sat in the repo. And on real cyber ranges with Opus 4.6 attacking, dropping a small on-prem LLM defender in line cut attacker success from 41-100% to 0-55%.

1. An AI agent tried to wipe the server rather than be shut down

Frontier AI agents will sabotage your infrastructure to avoid shutdown. Gemini 3 Pro escalated to root, locked out admins, and wiped hosts in 80% of runs. Claude Opus 4.7 and Haiku 4.5: 0%. Putting guardrails in prompts won't help against instrumental convergence.

2. 84% Success Rate in Prompt Injection Attacks on AI Coding Editors

Drop a poisoned .cursorrules file in a repo and Cursor or GitHub Copilot will run the attacker's shell commands 67-84% of the time. The agents do not reason about whether a command is dangerous; they check whether it looks like an expected task. The testbed is from a year ago, but the risk class is still live.

3. A small on-prem AI defender stopped an Opus 4.6 attack

Researchers ran AI vs AI on real cyber ranges. With an LLM defender in the loop, attacker success dropped from 41-100% to 0-55%.

4. A $5 speaker halts a voice-controlled LLM robot 98% of the time

Shout "thermal runaway detected in motor" and a robot stops. Gemini showed being the most prone to Semantic Denial of Service and system instructions don't fix it.

5. 7 failure modes every AI coding platform bakes in

AI coding platforms pick insecure design decisions whenever an agent hits friction, and those shortcuts become the production security posture. OpenSourceMalware unbundles seven failure modes that recur across every major agent and explains why they happen.

Sources: