Anthropic and ETH Zurich showed a fully automated deanonymization attack with 90% precision.
LLMs autonomously matched Hacker News and LinkedIn profiles using cross-platform references.
Simon Lermen and the team from MATS, Anthropic, and ETH Zurich published "Large-scale online deanonymization with LLMs" and showed how Hacker News and Reddit accounts can be deanonymized at scale.
Highlights:
- Four-stage pipeline (mass surveillance use case). An LLM [Extracts] bio sketches from HN and LinkedIn profiles, [Searches] for the closest HN match to a LinkedIn profile using Gemini embeddings and cosine similarity, then [Reasons] over the shortlist with Grok 4.1 Fast and GPT-5.2 CoT, and [Calibrates] for the precision-recall tradeoff.
- The pipeline identifies HN users at 45.1% recall at 99% precision.
- Autonomous deanonymization agent (targeted recon use case) autonomously searches the web, cross-references sources, and reasons over evidence to propose the identity of an HN user.
- The agent uncovered identities of 226 of 338 pseudonymous HN users (67% recall at 90% precision) at $1-$4 per target.
My take:
- LLMs are effectively advanced deanonymizers.
- Pseudonymity has relied on the assumption that deanonymization is expensive. Not anymore. We need to rethink what "sufficiently anonymized" means in the LLM era. E.g., HIPAA Safe Harbor strips 18 identifier types from clinical data, but soft identifiers in clinical notes (rare diagnoses, injury circumstances, social history) can reveal identity.
- Expect better targeting and recon by threat actors at scale. An everyday Joe in your company gets the same level of profiling that was previously justified only for high-value targets.
Large-scale online deanonymization with LLMs