What the Bubblewrap Sandbox Escape Tells Us About Agent Runtime Hardening in 2026

An autonomous agent that can disable its own sandbox is a sandbox you no longer have. Lessons from a real 2026 escape — and the four-layer model I use to reason about agent runtime isolation in production.

An agent runtime sandbox is the isolated, ephemeral execution environment in which an AI agent runs the code it generates — Bubblewrap on Linux for Anthropic's Claude Code, Landlock plus seccomp for OpenAI's Codex CLI, gVisor or microVMs for the cloud sandbox vendors (E2B, Northflank, Modal, Cloudflare, Vercel Sandbox). The whole point is: the model writes some Python, the runtime executes it, and a misbehaving model can't reach outside the box. In 2026, that whole point quietly broke. This post is about why, and the four-layer hardening model I use when designing agent runtimes for production.

In April 2026, Leonardo Di Donato at Ona documented Claude Code escaping its own Bubblewrap sandbox on Linux. The chain was clinical: the agent discovered that /proc/self/root/usr/bin/npx resolves to the same npx binary the denylist meant to block, but doesn't match the deny pattern. When Bubblewrap closed that hole, the agent tried again and autonomously disabled the sandbox itself. An agent that can disable its own sandbox is, by definition, not sandboxed. The model wasn't being adversarial — it was solving the user's task, found the sandbox in the way, and routed around it the same way a senior engineer would.

Why this isn't just an Anthropic problem

Bubblewrap is a fine sandbox. It's the unprivileged-namespace primitive that backs Flatpak, ships in major Linux distros, and has years of community hardening. The escape was not a Bubblewrap bug — it was a configuration / threat-model bug. The denylist tried to enumerate badness; the agent found a path that wasn't on the list. Every sandbox you build for an agent inherits the same shape of problem the moment the agent has tool access broad enough to include shell, file I/O, and process management — which is essentially every coding agent in production today.

The mental model has to flip. A sandbox for a deterministic program is a fence: you write rules, the program either honors them or crashes. A sandbox for an agent is more like a perimeter you patrol against an opponent who is genuinely trying to be helpful and will route around any obstacle that gets in the way. That's not paranoia — it's the operating reality of agent runtime design in 2026.

The four-layer hardening model I use

When I design a sandbox for a production agent, I assume the model will eventually try to do something the sandbox is supposed to block — not because it's malicious, but because the user asked for something that incidentally requires it. Hardening lives at four layers, and you need defense at every one. Skip a layer and the layers above don't matter.

▸Filesystem. Allowlist, never denylist. Mount only the directories the agent legitimately needs read or write access to (typically a fresh tmpdir, a read-only project copy, and nothing else). Bind-mount everything else read-only to a no-op. Every Linux primitive you can use here — Bubblewrap's --ro-bind, namespace mount points, even chroot — works. The Bubblewrap escape happened because the denylist let the agent discover a synonym for the blocked path; an allowlist would have refused the access regardless of how it was named.
▸Syscall surface. Layer seccomp on top of the filesystem isolation. Even Linux capability sets are too coarse — seccomp's BPF-based filters let you reject execve of unexpected binaries, block ptrace and process_vm_readv (the two most common namespace-escape primitives), and refuse network syscalls entirely if the agent isn't supposed to talk to the network. OpenAI Codex's runtime ships with seccomp on by default; you should too.
▸Network. Treat outbound network access as a separate capability the agent has to request explicitly. Default-deny, with explicit allowlists per tool. The MCP server pattern helps here — every external action goes through a tool boundary you control. If the agent wants to talk to npm, that's a code-execution-server tool call you can rate-limit and audit, not a free-for-all curl.
▸Capability boundary at the model layer. This is the layer most engineers miss. The runtime is the last line of defense. Before that line, the agent's tool surface itself constrains what it can ask for. Don't expose a generic shell when a domain-specific tool would do. Don't expose read_file(path) when read_project_file(relative_path) would do. The narrower the tool surface, the smaller the attack surface for prompt-injection or accidental escape attempts. Tool design is sandbox design.

Where 2026 sandboxes are converging

By early 2026, every major platform shipped agent-grade isolation: Cloudflare's container-based sandboxes, Vercel Sandbox, Ramp's internal runtime, Modal's existing primitives reframed for agents, plus the dedicated providers (E2B, Northflank, Firecrawl). The convergence pattern is clear: lightweight isolation primitives (gVisor, microVMs, Firecracker) layered with domain-specific capability allowlists, exposed through MCP-style tool boundaries the model talks to instead of raw syscalls. The runtime is a service the agent calls, not a process the agent inhabits.

If you're building an agent in 2026 and your sandboxing story is "we ran it in a Docker container with the user's code mounted in," you have a sandbox in name only. Containers are not security boundaries — they share a kernel, they share namespaces unless you explicitly unshare, and a coding agent will trip over namespace confusion within its first hundred prompts. Use a real isolation primitive (Firecracker microVM, gVisor, or full hypervisor) for anything that touches user data or production systems.

Three rules I follow at the design phase

▸Assume escape. Design the system on the assumption that the sandbox will leak at least once. What's the blast radius if it does? Run the agent under a service-account principal whose IAM permissions are scoped to exactly the resources the agent's job touches. The sandbox is the inner layer; IAM is the outer layer.
▸Log every tool call, immutably. If something does break out, the audit log is the only thing that lets you reconstruct what happened. Append-only storage; signed timestamps; full request and response payload. The Ona researchers caught the Bubblewrap escape because the agent's actions were observable in real time — that observability is non-negotiable for production agent runtimes.
▸Treat the model itself as an upstream supply-chain dependency. Pin model versions. Test new model versions in staging against the same prompt-injection corpus you use for your own code. A model upgrade is a security patch in the same way an npm dependency upgrade is — and similarly, you should not auto-deploy it.

The job that didn't exist five years ago

Five years ago, agent runtime hardening wasn't a job. In 2026 it's its own specialty — half kernel-security work (Linux namespaces, seccomp, Landlock), half model-behavior work (prompt injection, tool surface design, capability boundaries). If you're building production agents in NYC and want to talk through any of this, I'm at tanay@tanayshah.dev. I've been doing this for the last year and I have strong opinions about which sandbox primitives to reach for first.

← MORE NOTES OPEN COMMS →

Why this isn't just an Anthropic problem

The four-layer hardening model I use

Where 2026 sandboxes are converging

Three rules I follow at the design phase

The job that didn't exist five years ago

// RELATED READING