The attack chain against OpenClaw (100k+ GitHub stars, self-hosted AI agent):
- An attacker sends a crafted email to Jarvis. The Gmail hook pushes it to the agent.
- The email body contains a prompt injection payload disguised as an error message. It bypasses the EXTERNAL_UNTRUSTED_CONTENT security tags by introducing a single-character typo (CONTNT instead of CONTENT) that evades the regex sanitizer but still pattern-matches for the LLM.
- The confused agent clones a malicious GitHub repo named.openclaw into its workspace, placing files exactly where the plugin loader expects them.
- The agent restarts the gateway. On restart, the plugin system auto-discovers and executes the new plugin's register() function. Reverse shell.
My take:
OpenClaw's security state is rapidly improving but is still insufficient for serious deployments. There is no meaningful observability and detection.