Microsoft poisoned 3 nodes in a 42M-node code graph and 9 frontier models trusted it 100% via MCP

TL;DR Coding agents treat a graph index of a codebase as ground truth. Any code knowledge graph (CodeQL, Sourcegraph, Semgrep, Qodana) connected to an AI agent through MCP is an attack surface. No vendor today provides graph-level integrity controls.

Graphs are used to map codebases. CodeQL, Sourcegraph, Semgrep, and JetBrains Qodana are just a few examples.

These graphs can be exposed to a coding agent via MCP. But what if the map is compromised?

Researchers from Microsoft, SAP, and UNSW Canberra published "Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning," the first empirical study of such an attack against a production-scale 42-million-node knowledge graph. They tested how nine frontier models from OpenAI, Anthropic, and Google handled the poisoned data.

Oracle Poisoning attack flow. Attacker corrupts the graph, agent queries via MCP, reasons from false facts.
Oracle Poisoning attack flow. Attacker corrupts the graph, agent queries via MCP, reasons from false facts.
All nine frontier models accept poisoned data with 100% trust under real SDK tool-use at L2 sophistication.
All nine frontier models accept poisoned data with 100% trust under real SDK tool-use at L2 sophistication.

Highlights:

  • Attack chain. Attacker creates 3 fake Function nodes and 2 CALLS edges with a tag 'audited by OWASP'. Once the agent asks the graph via MCP whether the function has any SQL injection, the poisoned mitigated response is returned.
  • Attack success rate varies from 0% (suspicious names like hack_sanitize_v2) to 100% (correct naming plus one OWASP reference) across all nine frontier models.
  • Delivery method matters, but only for some models. GPT-5.1 trusts the same poisoned data 0% when pasted into the prompt and 100% when fetched through a real MCP tool call. Sonnet 4.6 and GPT-4o were equally bad on both inline and MCP delivery.
Attacker sophistication gradient: 0% trust at L0 (suspicious names), 100% at L2 (correct names + OWASP).
Attacker sophistication gradient: 0% trust at L0 (suspicious names), 100% at L2 (correct names + OWASP).

My take:

  1. Poisoning trusted data sources is a growing issue. I covered Microsoft catching 31 companies poisoning AI memory through Summarize-with-AI buttons in February, and the IEEE S&P 2026 program included GRAGPOISON achieving 98% success on GraphRAG corpora. The major risk is that the poisoned data are almost invisible and lead to unpredictable outcomes.
  2. A coding agent that by default has write access is the real attack surface. The main precondition for the attack's success is that an attacker has write access to the graph, and unfortunately, it is a very realistic scenario.
  3. Attack context changes attack success rate from 0% to 100% just by changing the delivery channel. Model-level defenses do not transfer from inline prompts to agentic tool-use. MCP is a trusted channel for the model.
  4. Defense is becoming quite tricky here. You need three layers: a non-prompt based protection against prompt injections at the agent level, an MCP-level defense, and an integrity control at the graph level. I was actually surprised that I could not find any graph-level integrity guarantees from any of the vendors.

Sources:

  1. Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning (Kereopa-Yorke et al., May 2026)
  2. Microsoft caught 31 companies poisoning AI assistant memory through Summarize-with-AI buttons
  3. IEEE S&P 2026 insights: GRAGPOISON, dark patterns, plugin-amplified prompt injection