Jun 5, 2026 Defense

Two proposed OWASP checks stop an agent clearing its gate

TL;DR: Two checks, 9.2.6 and 9.2.7, proposed for OWASP AISVS 1.01, move an action's risk label off the agent and into the tool manifest. A prompt-injected agent can no longer relabel its own irreversible action as low-risk to skip human approval, and a multi-step plan inherits the worst-case authority of any step it can reach.

OWASP AISVS, the Artificial Intelligence Security Verification Standard, is a community-driven catalog of testable security requirements for AI-enabled systems. Its existing check 9.2.1 requires human approval for executing privileged or irreversible actions.

Such actions can include code merges/deploys, financial transfers, user access changes, destructive deletes, and external notifications.

OWASP's C09 chapter maps actions to risk tiers, with required approval rising by tier:

Risk Tier	Examples	Approval Policy
Low	Read operations, status queries	Auto-approved
Medium	Write operations, API calls	May auto-approve based on threshold settings
High	Financial transfers, external communications	Requires human approval
Critical	Irreversible deletes, security config changes	Mandatory human review

Four publications in May 2026 point the same way: two show agent reasoning cannot be the safety boundary (Kereopa-Yorke et al.; Pulipaka et al.), and two argue the gate has to live at the action layer instead (Christodorescu et al.; Anthropic).

Two new controls, 9.2.6 and 9.2.7, proposed for the next version of AISVS, take the risk label away from the agent and stop it from sequencing small steps into a high-impact action.

9.2.6 requires the manifest to classify the action, not the agent.

Where the table above sorts actions by risk tier, this check grades each tool by a reversibility class instead, read-only, reversible, external-reversible, or irreversible, declared in its manifest, outside the agent's reach. The gate reads that, not what the agent emits at runtime. Unclassified tools fail closed to the strictest gate.

9.2.7 requires the worst-case action class across a multi-step chain to govern the gate, not the average.

Blast radius is a second axis, independent of reversibility, that can only raise an action's required authority, never lower it. Before a multi-step chain runs, the gate is set by its worst, least-reversible, highest-blast-radius step, so an agent cannot sequence individually low-gate actions into a high-impact irreversible outcome, the chaining that a behavioral firewall catches after the fact.

My take:

The safety check belongs at the action layer, not in the model's reasoning. It is the approval-gate version of treating the model as untrusted and moving the rules out of the prompt.
This does not replace Anthropic's Zero Trust framework. It adds the missing verification step, one that no longer relies on someone listing every risky action in advance.
It does not fix poisoned input. The agent will still trust poisoned knowledge-graph data and act on it with confidence, the same failure that let a phished employee's Claude Code leak AWS keys 24 of 25 times. What it buys you is that the agent still cannot run an irreversible action without clearing a gate it cannot relabel.

About the author

Mayur Agnihotri is Head of Threat Research at SecSphere SOC and SkyVirtRange, and Information Security Specialist at StraightArc Technologies. He focuses on agentic AI security, decision-rights, and reversibility-graded authority. He is an OWASP AISVS Contributor with active work in OWASP SPVS, Cornucopia, and the GenAI Agentic Security Initiative, and a reviewer for CSA's Non-Human Identity v1.0.

Sources:

OWASP AISVS 9.2.6 and 9.2.7, High-Impact Action Approval research chapter (proposed for 1.01) [URL deprecated]
Agent Security is a Systems Problem (Christodorescu et al., arXiv 2605.18991)
Zero Trust for AI Agents (Anthropic, May 2026)
Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning (Kereopa-Yorke et al., arXiv 2605.09822)
Hidden in Memory: Sleeper Memory Poisoning in LLM Agents (Pulipaka et al., arXiv 2605.15338)