5 stories this week that change your decisions (Jun 22-28, 2026)

TL;DR MIT's Charles Ye, Jasmine Cui, and Dylan Hadfield-Menell showed prompt injection works because a model judges text by how it sounds, not where it came from, so a passage forged to mimic the model's own reasoning jailbreaks it about 61% of the time. Separately, OpenAI launched Patch the Planet with Trail of Bits, HackerOne, and Calif, and already surfaced a 23-year-old use-after-free in OpenBSD's System V semaphores, 24 Linux kernel privilege-escalation exploits, and 34 FreeBSD vulnerabilities. And the National Academies concluded AI-driven cyber capabilities are advancing faster than anyone can measure them, with the near-term gap favoring attackers.

1. Prompt injection works by faking a role

Prompt injection is not patchable with delimiters or system prompts, because the model trusts text by how it sounds. Fake reasoning styled like the model's own thoughts jailbreaks it 61% of the time. Keep the same argument but strip that reasoning voice, and success drops to 10%.

2. The race to rescue open-source

The race to find, review, and patch vulnerabilities in open source, with Trail of Bits, HackerOne, and Calif on discovery, triage, and disclosure.

3. National Academies on AI and cybersecurity

AI-driven cyber capabilities are advancing faster than anyone can measure them, and in the near term the gap favors attackers.

4. GLM-5.2 shows the offensive AI gap is closing faster than expected

An open-weight model now matches frontier coding. Attackers won't bypass guardrails; they'll just run it on cheap or free compute.

5. The trick npm worms use to evade AI detection

The worms embed nuclear and biological weapons text to trigger refusals and context pollution in LLM scanners, before the scanner reaches the actual malware.

Sources: