Predicting AI attacks from IEEE S&P 2026 papers (preview)
TLDR: Six zero-days in ML model-loading workflows show deserialization of Python-executable artifacts defeats every safe-mode flag. GHOST-ATTACK exploits HMM unified CPU/GPU memory to hijack control flow in PyTorch from GPU kernels, bypassing ASLR. Before HMM, GPU memory corruption was isolated from the host. GRAGPOISON turns GraphRAG's shared knowledge-graph relations into cross-query poisoning at 98% success. IEEE S&P 2026 concentrates fire on one mechanism: infrastructure-as-attack-surface, where ML design decisions made for performance create the exploit paths adversaries take.
IEEE S&P is one of my favorite sources for forecasting what the next 3-5 years of security will look like. This year the gap is closing: several 2026 papers describe attack classes already showing up in production, from Unit 42 logging prompt injection on real websites to adversarial content hitting 37.8% of production agent interactions.
The 2026 program has 252 accepted papers so far. Here is what I found from 57 full preprints and 76 abstracts that are already available.
Highlights:
- GraphRAG's graph indexing weakens old RAG poisoning but enables a stronger new attack. GraphRAG under Fire introduces
GRAGPOISON, which exploits shared relations in the knowledge graph to compromise multiple queries at once, reaching up to 98% success with less than 68% of the poisoning text prior attacks needed. - LLM web agents fall for dark patterns designed for humans. Investigating the Impact of Dark Patterns on LLM-Based Web Agents is the first study of this risk, evaluating 6 web agents across 3 LLMs and finding agents fall for a single dark pattern an average 41% of the time, with susceptibility rising further when multiple patterns combine or UI attributes are tuned. This extends our coverage of DeepMind's AI agent traps taxonomy, which catalogs the inverse case: content hidden from humans but parsed by agents.
- Third-party chatbot plugins amplify prompt injection past the LLM's defenses. When AI Meets the Web studies 17 plugins used by 10,000+ public websites and finds two routes to amplified injection: 8 plugins (deployed on 8,000 sites) fail to protect conversation-history integrity, boosting direct-injection success 3-8x, while 15 plugins scrape mixed trusted and untrusted content into the model context, with ~13% of e-commerce sites already exposing chatbots to third-party content. This is the plugin-layer extension of the attack class Unit 42 has already detected in the wild.
- Loading a shared ML model is code execution, not data loading. On the (In)Security of Loading Machine Learning Models evaluates frameworks and hubs, uncovers six 0-day vulnerabilities, and lands the first officially recognized CVEs targeting Keras's 'safe mode,' PyTorch, and XGBoost; the authors argue that 'safe-mode' language in documentation reshapes users' sense of security more than it reduces actual code-execution risk.
- Frontier LLMs with web search can be jailbroken through retrieved URLs. URLcoat exploits the web-search capability of frontier LLMs through three strategies (obfuscating sensitive words, reconstructing harmful instructions via external URLs, and contextual narrative guidance), achieving 100% attack success across GPT-5, GPT-4o, DeepSeek R1, Grok 3, Kimi 1.5, ChatGLM-4, and multiple Gemini 2.0/2.5 variants. This extends a cross-model jailbreak pattern we've tracked: cyberpunk-narrative jailbreaks hit 71.3% across 26 frontier LLMs; URLcoat's web-search channel pushes the ceiling to 100%.
- GPU code on NVIDIA can break through CPU ASLR. Demystifying and Exploiting ASLR on NVIDIA GPUs runs the first comprehensive examination of GPU ASLR and uncovers an unrandomized GPU heap plus correlated GPU/CPU ASLR offsets (confirmed by NVIDIA), with a case study that infers CPU ASLR offsets from the GPU side.
- On unified-memory GPU systems, GPU-host isolation is gone. GHost in the SHELL presents
GHOST-ATTACK, the first GPU-originated exploitation technique that compromises host-process memory by exploiting memory-safety bugs in GPU kernels or running attacker-supplied kernels, bypassing ASLR and hijacking control flow in applications such as PyTorch and Chrome.