Whitepaper · sandboxd

sandboxd

A WebAssembly sandbox for running untrusted code with three independent fences and a deny-by-default host ABI. Built on wasmtime, in Rust.

MIT LicensedOpen SourceRustwasmtime 45Deny-by-defaultDeterministic

3independent fences

1audited host import

522fuel for fib(30), every run

~10mscold CLI invocation

v0.1 · 2026 · Sai Sarma · Sarma Linux

github.com/sarmakska/sandboxd Wiki Back to product page

Abstract

sandboxd is an open-source, MIT-licensed Rust crate and CLI that executes untrusted WebAssembly under three independent limits, a deterministic fuel budget on instructions executed, a wall-clock deadline enforced by epoch interruption, and a memory cap enforced by a wasmtime ResourceLimiter, together with a deny-by-default host ABI in which every guest import must be explicitly granted by name. This whitepaper documents the threat model, the design choices that fall out of it, the architecture, the implementation, the performance characteristics, and the use cases for which WebAssembly under sandboxd is the right shape, namely extension scripts, user-supplied formulas and plugin systems where the embedder must run untrusted code inside its own process but cannot afford to give that code the process.

01Executive Summary

A guest module is delivered as .wasm or .wat bytes. sandboxd compiles and validates it, walks its declared imports against an allow-list, builds a fresh wasmtime Store with fuel, an epoch deadline and a memory limiter applied, defines only the granted host imports on the Linker, arms a per-run watchdog thread, instantiates the module, and calls the exported function. The result is a typed RunOutput with the returned values and the fuel consumed, or a typed SandboxError that names exactly why the run stopped: FuelExhausted, Timeout, MemoryLimitExceeded, DisallowedImport, InvalidModule, ExportNotFound, or Trap.

The public API is six items: Sandbox, Limits, HostAbi, SandboxError, Value, RunOutput. There is no WASI, no ambient clock, no filesystem, no network, no environment. The one capability shipped today is host::log, opt-in via HostAbi::deny_all().allow_log(), which gives the embedder a shared sink of UTF-8 lines and bounds-checks every read from guest memory.

Determinism is the property the design optimises for. fib(30) consumes exactly 522 fuel on every run, every machine. That repeatability is what lets fuel double as a quota you can reproduce and a unit you can charge against.

02Background & Motivation

I wanted to run code I did not write, and did not trust, inside my own process, without giving it the process. The conventional answers are a container or a virtual machine per call, but spinning one of those up to evaluate a few hundred instructions of someone’s plugin is absurd overhead, and it still leaves you trusting a much bigger surface, a Linux kernel, a libc, a container runtime, a network stack. The cold-start time alone is a non-starter for interactive workloads.

WebAssembly is the right shape for this problem. A guest cannot name an address it was not given. It cannot call a function it was not handed. It runs on a runtime, wasmtime in this case, that was designed by people who care about isolation and is reviewed by people who care about isolation. The hard part is not the runtime, the hard part is the host boundary: which imports does the guest see, and what does each one do.

The motivating use case was internal: an editor that wanted to let third parties ship extensions. The constraints were tight: latency budget under a few milliseconds per invocation, embed in the host process, no network exposure for the extension by default, ability to bill or quota by computation, ability to reproduce failures locally. Containers and Firecracker fail the latency requirement. A bespoke interpreter fails the speed requirement. wasmtime with three fences and a small host boundary fits.

03Threat Model

The adversary

The guest is assumed to be fully hostile. It may spin forever or recurse without bound, try to allocate unbounded memory, import host functions it is not entitled to, pass malformed pointers and lengths to any host function it is given, and contain a deliberately crafted module designed to trip wasmtime. The embedder is trusted. The host machine and the wasmtime build are trusted.

What sandboxd guarantees

Bounded CPU. A run cannot execute more WebAssembly instructions than its fuel budget. Verified by the fuel_exhaustion_terminates integration test.
Bounded wall-clock. A run cannot occupy a thread past its configured deadline. Verified by epoch_timeout_terminates.
Bounded memory. A run cannot grow linear memory or tables beyond the configured cap. Verified by memory_cap_enforced.
No ambient authority. A freshly built HostAbi grants nothing. Any import not explicitly allowed is rejected before instantiation. Verified by disallowed_import_rejected and log_import_denied_by_default.
Safe host boundary for granted capabilities. The shipped host::log validates pointer and length with checked_add, slices guest memory with get so out-of-range reads trap, and uses from_utf8_lossy so invalid bytes never abort the host.
Run isolation. A fresh Store per run, so fuel, deadline, limiter, linear memory and globals are all per-call. One run cannot observe or influence another.

What sandboxd does not defend against

Side channels. Timing, cache and speculative-execution leaks are out of scope. If two mutually distrusting guests share a machine, sandboxd does not stop one inferring things about the other through microarchitectural state.
Denial of service within the limits. A guest that stays under its fuel, time and memory budgets can still consume the full budget on every call. Provisioning and rate limiting are the embedder’s responsibility.
Bugs in wasmtime or Cranelift. The isolation rests on wasmtime’s correctness. An escape there is an escape here. Keep the dependency current.
Host code the embedder writes. If you grant a capability whose implementation is unsafe, sandboxd cannot save you. Audit what you add.

04Goals & Non-goals

Goals

Run untrusted .wasm or .wat bytes with a CPU bound, a wall-clock bound and a memory bound.
Deny-by-default host ABI: imports must be granted by name.
Typed error per failure mode, so callers can branch without scraping strings.
Deterministic fuel accounting, so the same module on the same inputs consumes the same fuel on every machine.
Public API small enough to read in one screen.
Cold-start cost under 15 ms on a current laptop.

Non-goals

WASI. If the guest legitimately needs files, sockets or a clock, sandboxd is the wrong tool and WASI is the right one. They are different projects.
A plugin manager, package format, or registry. The scope is run these bytes under these limits and tell me what happened.
Defence against microarchitectural side channels. Out of scope, stated up front.
Replacing containers for full Linux workloads. If you need a distribution, use a container. sandboxd is for guests measured in kilobytes, not megabytes.

05Architecture

Run flow

rendering

sandboxd run flow: parse, allow-list, fresh per-run store with fuel + epoch + memory limiter, watchdog, typed outcomes.

Module map

File	Responsibility
`src/lib.rs`	Public surface re-exports: Sandbox, Limits, HostAbi, SandboxError, Value, RunOutput
`src/sandbox.rs`	The Sandbox struct: engine, import allow-list walk, store setup, instantiation, error mapping
`src/host.rs`	HostAbi, allow_log, the host::log implementation with bounds checks, the log sink type
`src/limits.rs`	Limits struct, MemoryLimiter implementing wasmtime::ResourceLimiter, Watchdog
`src/error.rs`	SandboxError enum and the Trap-to-variant mappers
`src/main.rs`	clap-based CLI, flag parsing, exit-code mapping
`fixtures/`	Five .wat fixtures: well_behaved, infinite_loop, memory_bomb, disallowed_import, logger
`tests/sandbox.rs`	Six integration tests that prove each fence holds

06Key Technical Decisions

Fuel and epoch interruption, not one or the other

Fuel is deterministic: the same module on the same inputs burns the same instructions every time, which is what makes it a replayable CPU bound. But fuel says nothing about wall-clock time, so a guest that calls a slow host function, or that the platform deschedules, can hold a thread while burning almost nothing. Epoch interruption catches that. I considered shipping fuel only and calling time-bounding the embedder’s problem. I rejected it because the moment you grant a host capability, time spent inside it is invisible to fuel, and a sandbox that cannot bound time is not a sandbox I would put untrusted code in front of. The two fences cost little together and cover each other’s blind spot.

A per-run watchdog thread, not a global ticker

wasmtime’s epoch counter does not advance by itself; something must call increment_epoch. The usual recipe is a background thread that bumps it on a fixed cadence. I went with a per-run watchdog that sleeps until the exact deadline, bumps once, and exits, polling a shared atomic so it stops early when the run finishes first. A global ticking thread is simpler to write but gives you coarse shared timing and a thread that runs forever; the per-run watchdog gives each call its own precise deadline and no idle thread between runs. The cost is one thread spawn per run, which against the cost of compiling and running a module is in the noise.

Deny-by-default with no WASI

The tempting path is to wire in wasmtime-wasi and then restrict it. I did not, because WASI’s surface is large and its preview is still moving, and grant-all-then-claw-back is exactly the deny-list posture that leaks. Starting from nothing and adding one audited function (host::log) means the allow-list is short enough to read in full and the default is the safe one. If you need files or sockets, that is a real need and WASI is the right tool, but it is a different project from this one.

Typed SandboxError per failure mode

I wanted callers to branch on why a run stopped, bill it, retry it, ban the module, without scraping strings. So fuel exhaustion, timeout, memory breach, disallowed import, invalid module, export mismatch and a generic guest trap are each their own variant, and the CLI maps each to its own exit code: 2 for FuelExhausted, 3 for Timeout, 4 for MemoryLimitExceeded, 5 for DisallowedImport.

Fresh Store per run, not a pooled Store

Pooling stores across runs is an obvious optimisation, but the interactions between leftover globals, residual fuel and the epoch counter become a footgun fast. The right optimisation if compile time dominates is to cache the compiled Module, not the Store. The Module is immutable and safe to share; the Store is per-call state that should be discarded.

07Implementation

Engine configuration

use wasmtime::{Config, Engine};

fn engine() -> Result<Engine, SandboxError> {
    let mut config = Config::new();
    config.consume_fuel(true);
    config.epoch_interruption(true);
    Engine::new(&config).map_err(SandboxError::from_engine)
}

The import allow-list walk

fn reject_disallowed_imports(&self, module: &Module) -> Result<()> {
    for import in module.imports() {
        let allowed = matches!(
            (import.module(), import.name(), self.host.log_allowed()),
            ("host", "log", true)
        );
        if !allowed {
            return Err(SandboxError::DisallowedImport {
                module: import.module().to_string(),
                name:   import.name().to_string(),
            });
        }
    }
    Ok(())
}

host::log, with bounds checks

fn read_guest_str(caller: &mut Caller<'_, HostState>, ptr: i32, len: i32)
    -> anyhow::Result<String>
{
    let ptr = u32::try_from(ptr)? as usize;
    let len = u32::try_from(len)? as usize;
    let memory = caller.get_export("memory")
        .and_then(|e| e.into_memory())
        .ok_or_else(|| anyhow!("guest did not export memory"))?;
    let data = memory.data(&caller);
    let end  = ptr.checked_add(len)
        .ok_or_else(|| anyhow!("ptr+len overflow"))?;
    let bytes = data.get(ptr..end)
        .ok_or_else(|| anyhow!("ptr/len out of bounds"))?;
    Ok(String::from_utf8_lossy(bytes).into_owned())
}

The watchdog

pub struct Watchdog {
    done: Arc<AtomicBool>,
    handle: Option<JoinHandle<()>>,
}

impl Watchdog {
    pub fn arm(engine: Engine, timeout: Duration) -> Self {
        let done = Arc::new(AtomicBool::new(false));
        let d = done.clone();
        let handle = std::thread::spawn(move || {
            let deadline = Instant::now() + timeout;
            while Instant::now() < deadline {
                if d.load(Ordering::Relaxed) { return; }
                std::thread::sleep(Duration::from_millis(1));
            }
            engine.increment_epoch();
        });
        Self { done, handle: Some(handle) }
    }
    pub fn disarm(mut self) {
        self.done.store(true, Ordering::Relaxed);
        if let Some(h) = self.handle.take() { let _ = h.join(); }
    }
}

Mapping wasmtime traps to typed errors

fn map_runtime_error(err: anyhow::Error, limiter: &MemoryLimiter) -> SandboxError {
    if limiter.growth_was_denied() {
        return SandboxError::MemoryLimitExceeded;
    }
    if let Some(trap) = err.downcast_ref::<Trap>() {
        return match trap {
            Trap::OutOfFuel  => SandboxError::FuelExhausted,
            Trap::Interrupt  => SandboxError::Timeout,
            other            => SandboxError::Trap { message: other.to_string() },
        };
    }
    SandboxError::Trap { message: err.to_string() }
}

08Why WebAssembly, not containers or Firecracker

The question I get asked most is why not Docker. The honest answer is latency budget and trust surface.

Latency

A container start measured end-to-end (image pull cached, network namespace set up, cgroup created, exec) is in the hundreds of milliseconds at best. Firecracker is faster, low tens of milliseconds, but still an order of magnitude over what an in-process wasmtime invocation costs. For an extension that runs on every keystroke, every webhook delivery, every plugin hook, this is the difference between feasible and not.

Trust surface

A container hands the guest a Linux kernel, a libc, a network stack and a filesystem. Even with seccomp, AppArmor, namespaces and dropped capabilities, that is a vast amount of code the guest can poke at. WebAssembly hands the guest a stack machine, a linear memory, and exactly the imports you defined. The trust surface is, in practical terms, wasmtime plus your host code. wasmtime is a single Rust codebase reviewed by the BytecodeAlliance; your host code is in front of you. Both are far smaller than a Linux distribution.

Determinism

Containers are not deterministic. The same code on the same input takes different wall-clock time, makes different syscalls in different orders, and has subtly different memory layouts. WebAssembly under fuel metering is deterministic by design: the same module on the same inputs consumes the same fuel every run. That property is what makes per-call billing reproducible and replay debugging possible.

When containers win

When the guest legitimately needs a filesystem, a process tree, a network stack, or a TTY. When the guest is an existing binary you cannot recompile to WebAssembly. When the guest needs gigabytes of RAM or hours of execution. sandboxd does not try to compete with containers there. It competes with them in the much narrower band where the guest is a small, computation-heavy module, the latency budget is tight, and the trust surface needs to be small enough to audit in full.

09Use Cases & Non-cases

Fits well

Extension scripts in editors and IDEs. Compute-heavy, latency-sensitive, no need for filesystem or network by default.
Plugin systems for SaaS products. Customers ship modules that extend behaviour. The host grants only the audited imports it has reviewed.
User-supplied formulas and pricing rules. Compile the expression to wasm, run under a fuel budget, return the result.
Replayable usage metering. Bill plugins per million fuel units, with the embedder and the customer in agreement about the count.
Untrusted code in agent tooling. An agent receives code and wants to run it without giving the agent process more authority than it had before.
CI-time policy or lint engines shipped as wasm modules, sandboxed in the CI runner with a fuel cap to bound bad behaviour.

Does not fit

Full distributions. If you need apt, sshd or systemd, you need a container.
Filesystem-heavy workloads. WASI is the right answer, and sandboxd intentionally does not ship it.
Long-running daemons. sandboxd’s model is per-invocation: run, return, discard the store.
Mutually distrusting guests on shared hardware where side channels matter. Use separate machines or a VM-level isolation primitive.

10Results & Performance

Measured on an Apple M3 Pro (macOS 26.3, Rust 1.96, release)

Scenario	Result
`add(2, 40)`	returns `I32(42)`, fuel consumed 4
`fib(30)`	returns `I32(832040)`, fuel consumed 522, identical every run
100 cold CLI invocations of `fib(30)`	1.06 s total, about 10.6 ms per process
infinite loop, 1,000,000 fuel	stopped, exit 2 (FuelExhausted)
infinite loop, 100 ms timeout, near-infinite fuel	stopped, exit 3 (Timeout), ~145 ms end-to-end including spawn + compile
memory bomb, 4 MiB cap	stopped, exit 4 (MemoryLimitExceeded)

Determinism is the result I care about most. fib(30) consumes exactly 522 fuel every single time, which is what lets fuel double as a quota or a billing unit you can reproduce. The cold-start cost is dominated by Cranelift compile; for embedders that run the same module repeatedly, caching the compiled Module brings per-invocation cost down to the call itself.

11Lessons & Trade-offs

What worked

Walking imports before constructing the Store. Rejecting at the door, with the offending import named in the error, is much more useful than a generic instantiation failure.
Per-run watchdog instead of a global ticker. Precise deadlines, no idle thread, trivially testable.
Two fences on the infinite loop. Shipping both fuel and time means the most common attack is stopped redundantly. That feels excessive on paper and is reassuring in production.
Typed errors with the CLI mapping to exit codes. Makes it trivial to script around sandboxd in CI: a non-zero exit always tells you why.

What I got wrong on first pass

Initially conflated MemoryLimitExceeded with Trap. When memory.grow is refused, the guest sees -1 and may execute an unreachable, which traps. Without checking growth_was_denied first, that reported as a generic Trap, which hid the real cause. Mapping memory cap breaches before generic traps fixed it.
First watchdog used a shared global ticker thread. Simple to write, gave coarse timing, and held a thread forever even when no run was active. Replacing it with a per-run thread cost one spawn per call and removed all the failure modes.
Initial CLI returned exit code 1 for every error. Made it useless to script against. Per-variant exit codes (2/3/4/5) and a printed error name on stderr made the CLI compose with the rest of a pipeline.

Trade-offs I accept

Cold start is dominated by compile. Embedders that care should cache compiled Modules; sandboxd does not pretend to solve this for them yet.
No defence against side channels. Out of scope. If you need that, use machine isolation.
Soundness rests on wasmtime. Any sandbox built on someone else’s runtime inherits that runtime’s assumptions. Keep it current and report bugs upstream.

12Conclusion

sandboxd demonstrates that a credible WebAssembly sandbox for untrusted code is small. Three independent fences (fuel, time, memory), one audited host capability (host::log), six items on the public API, one file for the host boundary. The hard work is not the volume of code; the hard work is being honest about what is and is not in scope, and deciding to ship fewer features by default than the runtime allows.

If your guest is a small computation, your latency budget is tight, and your trust surface needs to be auditable in full, sandboxd is the right shape. If your guest is a full distribution or needs WASI, it is not, and the whitepaper says so up front so you can decide before you adopt it. The project will gain a monotonic clock and a seeded RNG as additional audited capabilities; it will not gain WASI, a package format, or a network. The scope stays small.

Star the repo Read the wiki How it works Back to product page Hire me to build with it

AConfiguration

Limits struct

Field	Type	Purpose
`fuel`	`u64`	Maximum WebAssembly instructions executable per run
`timeout`	`Duration`	Wall-clock deadline enforced by the watchdog
`memory_bytes`	`usize`	Maximum linear-memory size, enforced by the ResourceLimiter

CLI flags (src/main.rs)

Flag	Default	Purpose
`--invoke <name>`	`_start`	Exported function to call
`--arg <i32>`	(repeatable)	i32 argument to pass; CLI is i32-only for simplicity
`--fuel <u64>`	1,000,000	Fuel budget for the run
`--timeout-ms <u64>`	1,000	Wall-clock deadline in milliseconds
`--memory-mb <usize>`	16	Memory cap in MiB
`--allow-log`	off	Grant the `host::log` capability for this run

Exit codes

Code	Meaning
0	Success
1	Generic error (InvalidModule, ExportNotFound, Trap)
2	FuelExhausted
3	Timeout
4	MemoryLimitExceeded
5	DisallowedImport

BProduction Checklist

Pin the wasmtime version in Cargo.toml and update on a schedule; an escape there is an escape here.
Subscribe to wasmtime security advisories. The BytecodeAlliance publishes them on the wasmtime repo.
Size limits from one observed run. Run a representative input, read fuel_consumed from RunOutput, set production fuel to a small multiple of that.
Cache the compiled Module if you re-run the same bytes. Compile dominates cold-start cost; the Module is immutable and safe to share across runs.
Audit every additional host capability you add. Use host::log as the reference: validate every argument, never hand the guest a host pointer, bounds-check every read from guest memory.
Quota and rate-limit at the embedder layer. sandboxd bounds a single call; runaway call rates are your problem.
Log the SandboxError variant per failed run. The variant is the signal you need to bill, retry, or ban; the message is for humans.
Treat the host boundary as part of your security review surface. It is one file (src/host.rs) plus whatever you add. Keep it that small.