LLM self-replicating worm
TL;DR: A local open-weight LLM powered a proof-of-concept worm that exploited 73.8% of a 33-host test network and replicated onto 61.8%, showing that adaptive AI-driven replication can work without a frontier model or API calls.
A local open-weight LLM can power a self-replicating adaptive worm.
This shows that self-sustaining AI-driven cyber threats are no longer theoretical, and it connects directly to the self-replication pattern I covered in the instrumental convergence timeline.
Jonas Guan and Nicolas Papernot from the University of Toronto built a proof-of-concept AI-driven worm.
Highlights:
- It used a 2025 open-weight LLM that fits on one A100 80GB GPU, with no fine-tuning.
- Small agent code runs on many machines, while LLM reasoning runs only on machines with a GPU.
- The test environment included 33 hosts across Linux, Windows, servers, workstations, and IoT-style devices.
- Success rate - exploited 73.8% of the network and replicated onto 61.8%.
- Correctly identified vulnerabilities 82% of the time, successfully exploited 44% of attempts, and replicated 88% of the time after successful exploitation.
- The main failure modes were malformed payloads, syntax errors, bad exploit mechanics, and failure to localize the vulnerable endpoint.
My take:
- The most important point from the experiment is that the worm can use local compute, so the attacker does not carry ongoing model API costs. That preserves the economics of a cyberattack, the same pressure I covered in AI-enabled exploitation making attacks cheaper.
- The PoC validates that you do not need a frontier model for this kind of task. Open-weight LLMs are already enough to do the job, as we also saw when a 4B model hit 95.8% on Linux privilege escalation at 100x lower cost than Opus.
- As with many PoCs, this does not prove that an AI can penetrate a properly hardened real-world network.
- What the PoC proves is that the core loop of adaptive reasoning, exploitation, replication, and compute acquisition works.
- Implementation details are sparse, but the memory design is interesting: General, Host, and Vulnerability memories store the current lifecycle phase, active target, current vulnerability hypothesis, task, command history, observations, failures, and counters.