Microsoft found a way to trigger LLM backdoors through conversation history

Microsoft found a way to trigger LLM backdoors through conversation history.

You think LLMs are stateless, but they're not forgetful.

Ahmed Salem and the team at Microsoft Security Response Center (MSRC) introduced "implicit memory". LLMs carry hidden state across independent sessions by encoding it in their own outputs.

Why does it matter?

It's hard to trigger a backdoor for a specific target. Implicit memory provides advanced profiling by combining 8 different factors across sessions.

How does it work?

An AI assistant reinjects its conversation history as context at every run. An attacker makes the model embed hidden markers in its responses, tracking distress signals across sessions. Once all 8 signals accumulate, the backdoor activates.

Result:

98.4% correct and <2% false activations

My take:

Backdoors are advancing. Assume that retrieval is guaranteed.
Implicit memory makes trigger detectors useless.
I wish I had good news, but for now, if you're building an AI system, make sure that you and your team use the models you trust. If you're buying an AI system, ask the vendor to show you their model supply chain, not just a model card.

Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs

Implicit memory attack: LLM encodes hidden markers across conversation sessions

Eight distress signals tracked across sessions to activate a targeted backdoor

Backdoor activation achieves 98.4% accuracy with under 2% false activations

Attack flow: AI assistant reinjects conversation history enabling cross-session state