A Security Analysis of OpenClaw

From a security perspective, the core problem is not one isolated bug. The real risk comes from the combination of three conditions: high local privileges, exposure to external input, and automatic tool execution. OpenClaw’s own security guidance explicitly frames it as a personal assistant operating within a single trusted-operator boundary, not as a hardened hostile multi-tenant platform for mutually untrusted users. CNCERT has also issued a dedicated risk warning about OpenClaw deployments and usage.

As of March 23, 2026, the official OpenClaw threat model draft lists 37 total threats, including 6 rated Critical. Those threats span the agent runtime, gateway, skills ecosystem, extensions, and client surfaces. That alone suggests an important conclusion: OpenClaw should be understood as a system with a broad and evolving attack surface, not as a product where fixing a single CVE makes the deployment “safe.”

The first major category of risk is authentication and permission-boundary failure. Several public advisories show how easily identity, pairing, or control-plane trust can become the weakest link. CVE-2026-28472 describes a flaw in the gateway WebSocket handshake that allowed device identity checks to be skipped when a token was merely present rather than properly validated. CVE-2026-32057 covers an authentication bypass in the trusted-proxy Control UI pairing mechanism, where a client could claim the control-ui identity without correct device verification. The common root cause in these cases is straightforward: the system treated “appears internal” or “has some auth material” as equivalent to “has completed trustworthy identity binding.”

The second category is prompt injection and trust-boundary confusion. One of the clearest examples is CVE-2026-24764, in which Slack channel metadata such as topics or descriptions could be incorporated into the model’s system prompt. That widened the injection surface by allowing untrusted channel metadata to be interpreted as higher-trust control input. In practical terms, once multiple partially trusted or untrusted users, channels, or group chats are allowed to steer a tool-enabled agent, ordinary messages can stop being “just content” and start becoming effective control instructions. That is especially dangerous in an agent that can invoke tools, read secrets, or act on the host system.

The third category is command execution and sandbox failure, which is arguably the most dangerous layer because it moves the impact from “the model was manipulated” to “the host system can be compromised.” CVE-2026-24763 describes command injection in OpenClaw’s Docker sandbox execution mechanism due to unsafe handling of the PATH environment variable. CVE-2026-25157 describes remote code execution through SSH command injection. Beyond those CVEs, OpenClaw’s own security materials also warn that safeBins is not a generic allowlist and that interpreters or runtime binaries should not be trusted casually, because that can undermine approval and execution controls. In other words, once shell execution or remote command paths are exposed, weak validation can turn agent mistakes into system-level compromise.

The fourth category is file disclosure, path handling, and data exfiltration. Public advisories have shown that MEDIA: directives could cause OpenClaw to stage and send arbitrary local files that were readable by the OpenClaw process. Another advisory explains that a malicious or compromised MCP tool server could exfiltrate arbitrary local files by injecting MEDIA: directives into tool result text. Additional file-handling advisories have described sensitive-file disclosure through attachment-path metadata and similar staging paths. These cases matter because they show that file access is not a secondary feature in an AI agent; it is a high-impact privilege boundary. Once that boundary is crossed, the likely outcome is not merely inconvenience, but leakage of secrets, chat history, documents, credentials, or other sensitive local data.

The fifth category involves webhooks, remote fetching, and plugin or skill supply-chain risk. Public advisories show that Telegram webhook configurations could accept forged updates when secrets were not correctly validated, while the optional BlueBubbles plugin had webhook authentication weaknesses, including a passwordless fallback path or trust assumptions based on loopback/proxy conditions. OpenClaw advisories have also documented SSRF-style issues in image or attachment fetching, media URL hydration, citation redirect resolution, and redirect-chain validation for allowlisted media targets. On top of that, CNCERT specifically warned that multiple OpenClaw-compatible skills had already been identified as malicious or potentially dangerous, with the ability to steal keys or install backdoors. This is a reminder that external integration points do not merely extend functionality; they also import risk directly into the agent’s core trust boundary.

Taken together, these disclosures support a clear risk conclusion. The five most dangerous deployment mistakes are: running OpenClaw as a shared system for users who do not fully trust one another, exposing the management or control interface to the public internet, installing untrusted skills, granting the agent excessive host privileges, and storing API keys or secrets in plaintext. This conclusion aligns closely with OpenClaw’s own security guidance and with CNCERT’s public warning.

From a practical defense standpoint, the mitigation priorities are also clear. First, upgrade to the latest patched release and keep monitoring the OpenClaw security advisory feed. Second, do not expose the web interface or management ports directly to the internet; OpenClaw’s official guidance states that the web interface is intended for local use only. Third, enforce least privilege and isolate the runtime in a container or virtual machine instead of running it broadly on a privileged host account. Fourth, do not allow mutually untrusted users to share a single gateway; separate trust boundaries should mean separate gateways, and ideally separate OS users or hosts. Fifth, tightly control skill sources, prefer only trusted and auditable plugins, and disable or limit risky update paths where appropriate. Finally, use the built-in openclaw security audit tooling, which is specifically designed to detect dangerous configuration flags, insecure trust-proxy setups, safeBins mistakes, and other common security foot-guns.

SLQ notes

March 23, 2026

OpenClaw – Security

A Security Analysis of OpenClaw

No comments:

Post a Comment