Seven scanners for malicious AI agent skills agree on only 0.12%

Agent skills run code with full agent permissions. A malicious skill can harvest credentials, exfiltrate data, or hijack your agent's actions. Three marketplaces and GitHub now host over 238,000 of them.

Skill scanners are supposed to weed out the bad ones, right? But how well are the scanners calibrated and how much can we really trust them?

Florian Holzbauer and Johanna Ullrich at the Interdisciplinary Transformation University in Austria, with David Schmidt, Gabriel Gegenhuber, and Sebastian Schrittwieser at the University of Vienna, published "Malicious Or Not: Adding Repository Context to Agent Skill Classification." They scanned 238,180 skills from three marketplaces and GitHub, showed why content-based scanners fail, and built a repository context scoring method that reduced the investigation surface from thousands of flags to 15 suspicious repositories.

Fig. 1. Overview of the repository-aware skill analysis approach to reduce the high rate of malicious claims. The approach is tested in a three-stage pipeline encompassing cross-platform skill collection, malicious classification, and repository context analysis.
Fig. 1. Overview of the repository-aware skill analysis approach to reduce the high rate of malicious claims. The approach is tested in a three-stage pipeline encompassing cross-platform skill collection, malicious classification, and repository context analysis.

Highlights:

  • 238,180 unique skills from ClawHub, Skills.sh, SkillsDirectory, and GitHub, the largest empirical security analysis of the agent skill ecosystem to date.
  • Seven scanners tested: five deployed across marketplaces (VirusTotal and OpenClaw Scanner on ClawHub, Agent Trust Hub, Snyk, and Socket on Skills.sh) plus Cisco Skill Scanner and a GPT 5.3-based LLM classifier applied by the researchers.
  • Scanners classified from 3.8% (Socket) to 41.9% (OpenClaw Scanner) of skills as malicious. Of 8,402 skills flagged by at least one scanner, 72% were flagged by only one. On Skills.sh, the only marketplace where all five scanners overlapped, they agreed on 33 out of 27,111 (0.12%).
  • Inconsistent classifications stem from analyzing skills in isolation. Legitimate and malicious skills share the same code patterns: environment variable access, network calls, filesystem operations. Without broader context, scanners cannot separate them.
  • GitHub repository as a signal: the repo's purpose, code, and documentation match the skill's stated function (70% weight), combined with repo age, stars, forks, and activity (30% weight).
  • The highest single-scanner flag rate was 41.9% (OpenClaw Scanner on ClawHub). Requiring both the Cisco scanner (HIGH/CRITICAL) and the LLM classifier (>3/5) to agree narrowed flagged GitHub-hosted skills to 3.7% (8,153 out of ~221,000). Repository context narrowed it to 0.52% (15 out of 2,887 sampled).
  • The study also uncovered new attack vectors: seven abandoned GitHub repositories could be hijacked to take over 121 marketplace-listed skills (most-downloaded: 2,032 installs). Skills.sh requires no authentication for publishing. ClawHub's API leaks skill owners' GitHub emails. Twelve functional API credentials (NVIDIA, ElevenLabs, Gemini, MongoDB) were embedded in published skills.
Table 2. Comparison of security scanners used in the skill ecosystem. The table contrasts scanners deployed on ClawHub and Skills.sh with Cisco's skill scanner and the authors' LLM-based feature set.
Table 2. Comparison of security scanners used in the skill ecosystem. The table contrasts scanners deployed on ClawHub and Skills.sh with Cisco's skill scanner and the authors' LLM-based feature set.

My take:

  1. Agent skills are one of the fastest-growing AI attack surfaces. In January I compared them to Chrome extensions in 2012 when 26% of 31,132 skills had vulnerabilities. In February, 157 confirmed malicious across 98,380 skills, 54% from one actor.
  2. AI security vendors and AI enthusiasts quickly responded with skill scanners. I count at least 30.
  3. Remi Poujeaux used to say that quick-and-dirty solutions aren't necessarily quick, but always dirty. The 20%-49% interrater agreement shows that the scanners are far from maturity yet.
  4. Every scanner optimizes for its objectives and grades its own homework. VirusTotal aims at recall, logically, as their name is now on ClawHub. Snyk conservatively shoots at low FPs knowing how they frustrate enterprise customers.
  5. Repository context is a good, but easily gameable signal. Responsibility must be on skill marketplaces to implement a four-legged app safety stool that Apple and Google have already built: Identity, Declaration, Validation, and Enforcement.
  6. Until marketplaces implement identity verification, permission declarations, pre-publish validation, and runtime enforcement, enterprises deploying AI agents should build skills internally.

Sources:

  1. Malicious Or Not: Adding Repository Context to Agent Skill Classification (Holzbauer, Schmidt, Gegenhuber, Schrittwieser, Ullrich, 2026)
  2. Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study (Liu et al., 2026)
  3. Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale (Liu et al., 2026)
  4. Cisco AI Defense Skill Scanner (GitHub)
  5. 54% of malicious agent skills are authored by the same threat actor (The Weather Report)
  6. 26% of 31,132 agent skills appeared to be vulnerable (The Weather Report)