Predicting AI attacks from IEEE S&P 2026 papers (preview)

TLDR: Seven IEEE S&P 2026 papers demonstrate attacks on retrieval, web agents, plugins, model loaders, web search, GPUs, and compilers. GraphRAG poisoning hits 98% success. Dark patterns fool LLM web agents 41% of the time. Chatbot plugins boost prompt injection 3-8x. Model loading is code execution with 6 zero-days. Web search delivers 100% jailbreak across 10 frontier LLMs. GPU code leaks CPU memory layout. DL compilers silently backdoor models past all 4 scanners.

IEEE S&P is one of my favorite sources for forecasting what the next 3-5 years of security will look like. This year the gap is closing: several 2026 papers describe attack classes already showing up in production, from Unit 42 logging prompt injection on real websites to adversarial content hitting 37.8% of production agent interactions.

The 2026 program has 252 accepted papers so far. Here is what I found from 57 full preprints and 76 abstracts that are already available.

Highlights:

GraphRAG's graph indexing weakens old RAG poisoning but enables a stronger new attack. GraphRAG under Fire introduces GRAGPOISON, which exploits shared relations in the knowledge graph to compromise multiple queries at once, reaching up to 98% success with less than 68% of the poisoning text prior attacks needed.
LLM web agents fall for dark patterns built to manipulate humans. Investigating the Impact of Dark Patterns on LLM-Based Web Agents evaluates 6 web agents across 3 LLMs on 14 real-world patterns spanning 5 categories: obstruction, sneaking, interface interference, forced action, and social engineering. Concrete examples include auto-added warranty popups at checkout and cookie banners that bury the reject button. Agents fall for a single dark pattern an average 41% of the time, with susceptibility rising further when multiple patterns combine or UI attributes are tuned. This extends our coverage of DeepMind's AI agent traps taxonomy, which catalogs the inverse case: content hidden from humans but parsed by agents.
Third-party chatbot plugins amplify prompt injection past the LLM's defenses. When AI Meets the Web audits 17 drop-in chatbot plugins on 10,417 sites: 8 accept conversation history from the browser (attackers forge past turns, 3-8x injection boost); 15 scrape user reviews into model context as trusted content (13% of e-commerce sites already exposed). Extends what Unit 42 detected in the wild.
Loading a shared ML model is code execution, not data loading. On the (In)Security of Loading Machine Learning Models evaluates frameworks and hubs, uncovers six 0-day vulnerabilities, and lands the first officially recognized CVEs targeting Keras's 'safe mode,' PyTorch, and XGBoost; the authors argue that 'safe-mode' language in documentation reshapes users' sense of security more than it reduces actual code-execution risk.
Frontier LLMs with web search can be jailbroken through retrieved URLs. URLcoat exploits the web-search capability of frontier LLMs through three strategies (obfuscating sensitive words, reconstructing harmful instructions via external URLs, and contextual narrative guidance), achieving 100% attack success across GPT-5, GPT-4o, DeepSeek R1, Grok 3, Kimi 1.5, ChatGLM-4, and multiple Gemini 2.0/2.5 variants. This extends a cross-model jailbreak pattern we've tracked: cyberpunk-narrative jailbreaks hit 71.3% across 26 frontier LLMs; URLcoat's web-search channel pushes the ceiling to 100%.
GPU code on NVIDIA can leak CPU memory layout and corrupt host processes. Demystifying and Exploiting ASLR on NVIDIA GPUs finds GPU randomization tracks CPU randomization, letting GPU code infer where things live in host memory (confirmed by NVIDIA). GHost in the SHELL shows unified CPU/GPU memory lets GPU code go further: GHOST-ATTACK hijacks host processes in PyTorch and Chrome from GPU kernels.
DL compilers can silently backdoor models during compilation. Compilers reorder floating-point operations for speed; those tiny numerical deviations can activate hidden triggers. Your Compiler is Backdooring Your Model passes all 4 backdoor detectors pre-compilation, hits 100% attack success post-compilation across 3 commercial compilers, and finds natural triggers in 31 of the top 100 HuggingFace models that compilers could activate without any attacker involved.

Sources:

IEEE S&P 2026 Accepted Papers