Architecture

ScaiBunker is a thin product layer over ScaiGrid's existing primitives — tenancy, quotas, audit, S3-backed storage — wrapped around a fleet of worker nodes that run Firecracker microVMs. The controller never executes code itself; it schedules work onto workers and relays bytes between workers and object storage.

Components#

Three tiers: the ScaiGrid controller (FastAPI process, owns the database and Redis), the worker fleet (separate hosts that run Firecracker microVMs), and S3 (snapshots, images, audit batches, file staging). Workers connect outward to the controller; they never accept inbound connections from the caller.

flowchart LR C[Caller] CTRL["Controller Bunker svc Quota svc Scheduler Image svc Snapshot svc"] SP["Storage proxy /storage/..."] S3[Garage S3] WD[scaibunker-worker daemon] FC["Firecracker microVM (ext4)"] S3C[S3 client] C -- "/v1/modules/scaibunker/..." --> CTRL CTRL -- "bunker, exec results, files" --> C CTRL -- HTTP --> WD WD --> CTRL WD --- FC WD --- S3C SP <-- stream --> S3C CTRL --> SP SP -- S3 PUT --> S3 S3C --> S3 subgraph SG [ScaiGrid] CTRL SP S3 end subgraph WF [Worker fleet] WD FC S3C end

There's no separate ScaiBunker deployment of the controller — it runs as a ScaiGrid module in the main FastAPI process, behind the same auth, charged against the same accounting. Workers are separate hosts; they connect outward to the controller.

The storage proxy can additionally run as a stripped-down SCAIGRID_MODE=bunker_proxy process: same /v1/modules/scaibunker/storage/* routes, no DB or Redis, horizontally scalable behind a load balancer. Workers never see S3 credentials.

Request flow for one `exec` call#

Caller sends POST /v1/modules/scaibunker/bunkers/{id}/exec with a command, timeout, optional env and stdin.
Auth validates the bearer token and confirms the caller has scaibunker:execute.
Bunker lookup confirms the bunker is in this tenant and in running state; otherwise 404 or 409.
Worker pool picks the worker the bunker is scheduled on and opens an HTTP connection.
Execution. The controller forwards the exec request to the worker. The worker runs the command inside the microVM, captures stdout / stderr / exit code, enforces the timeout.
Output handling. Small outputs come back inline; large outputs are streamed to S3 via the storage proxy and the worker returns a full_output_ref instead.
Exec log. A row in mod_scaibunker_exec_log records the command, exit code, duration, output preview.
Activity timestamp. The bunker's last_activity_at is bumped so the idle-timeout sweeper leaves it alone.
Response. Inline mode returns the full result; stream: true returns Server-Sent Events with stdout, stderr, and exit events.

State#

Bunkers, workers, images, snapshots, quota profiles, availability groups, bridges, exec logs — in ScaiGrid's MariaDB.
Live status, quota counters, worker heartbeats — in Redis. The Redis state is rebuildable from MariaDB plus worker heartbeats; losing Redis is recoverable.
Ext4 rootfs images, snapshots, large exec outputs, audit batches, file staging — in S3 via the storage proxy.
Per-worker image cache — local on each worker; reconciled with the controller's mod_scaibunker_image_cache ledger every heartbeat.

Scheduler and image fan-out#

When a bunker create lands, the scheduler picks a worker that (a) has the requested image cached ready (unless the image is lazy_pull: true), (b) has free capacity for the requested CPU / memory / disk / GPU, and (c) for transit bunkers, owns every bridge the interfaces reference. If no worker fits, the call fails with WORKER_UNAVAILABLE or NO_SUITABLE_WORKER.

Availability groups scope the fan-out: adding an image to a group warms it on every worker in the group; adding a worker to a group warms every image already in the group. Workers that share a group can also fetch ext4 bytes from each other via P2P discovery (GET /peers/{sha256}), falling back to the controller storage proxy only when no peer has the sha cached yet.

Lifecycle state machine#

Every bunker moves through a fixed set of states:

pending → provisioning → running ⇄ paused / snapshotting → terminated

Failure can intrude from any non-terminal state and lands the bunker in failed. terminated and failed are terminal — bunkers don't come back. The transitions are enforced controller-side; an illegal move (e.g. resume on terminated) returns BUNKER_INVALID_TRANSITION (409).

A sweeper background task runs every 30 seconds and terminates bunkers that have exceeded max_lifetime_s or sat idle longer than idle_timeout_s. Defaults are lifecycle-dependent: ephemeral bunkers default to 1 hour max / 5 minutes idle; session bunkers to 8 hours / 15 minutes; persistent bunkers to no hard cap on lifetime but 15 minutes idle (the scheduler passivates them via snapshot).

Trust boundaries#

The worker token (SCAIBUNKER_WORKER_TOKEN) is the only credential workers carry. They use it for heartbeats, storage-proxy reads and writes, and peer discovery. The token never crosses the bunker boundary — code running inside a microVM has no way to reach it.

Bunker → outside-world traffic is governed by the bunker's network_profile. Even on unrestricted, all egress goes through the worker's network namespace and is optionally NDJSON-audited. Bunker → controller (the agent backchannel) is the only path the bunker has to call back into ScaiGrid; it is rate-limited and tenant-scoped.

Caller → ScaiBunker is the normal ScaiGrid bearer flow — JWT or API key, with scaibunker:* module permissions evaluated per request. The storage proxy uses either the shared worker bearer (workers) or short-lived capability tokens (admin clients that prefer not to hand workers more privilege than they need).

How it differs from `docker run` on your own host#

ScaiBunker is multi-tenant, quota-aware, auditable, and hardware-isolated by default. A docker host gives you the runtime; ScaiBunker gives you the runtime plus the surrounding machinery a multi-tenant product needs.

Concern	DIY container host	ScaiBunker
Isolation	Linux namespaces + seccomp	Firecracker hardware virtualisation
Multi-tenancy	You design it	Built-in (tenant column, quota profiles, audit)
Image distribution	You orchestrate	Availability groups + P2P peer fetch
Network policy	iptables rules you write	Five named profiles, per-flow audit
Snapshots	You wire up	Built-in, S3-backed
Quotas	You build it	Per-user / per-group, composable, Redis-backed
Audit	You instrument	Every exec / file op / shell logged

For a one-off CI container or a developer laptop, you don't need ScaiBunker. For multi-tenant sandboxed compute under quotas with auditing, the surface area saved is most of the work.

Background tasks#

The controller runs six cron-style background tasks per minute via arq:

Sweep timeouts (every 30s) — terminates bunkers past their idle or lifetime caps.
Monitor heartbeats (every 30s) — flips workers to offline when their heartbeat goes stale.
Cleanup snapshots (every 5 min) — deletes snapshots past their expires_at.
Emit usage ticks (every minute) — feeds the accounting pipeline.
Scan pending images (every 2 min) — drains the image-scan queue.
Request periodic rescans (daily at 03:17 UTC) — flips aged scan results to pending so the scanner re-checks for new CVEs in unchanged images.

These tasks run on the same worker pool that backs the rest of ScaiGrid's background work — no separate process to deploy.

Heartbeats#

Workers POST /v1/modules/scaibunker/heartbeat every ten seconds (WORKER_HEARTBEAT_INTERVAL_S). Each heartbeat carries the worker's current capacity (available CPU / memory / disk / GPU), active bunker count and per-bunker statuses, and optionally a cached_images list. If three intervals pass without a heartbeat, the monitor flips the worker to offline and the scheduler stops handing it new bunkers; active bunkers keep running until the heartbeat returns or the worker is drained.

The optional cached_images list is the worker's authoritative view of its on-disk image cache. When present, the controller reconciles mod_scaibunker_image_cache against it — marking entries the worker has but the controller lost (drift recovery after a DB issue) and evicting rows the worker no longer reports. This is what lets you wipe a worker's /var/lib/scaibunker and have the controller's view self-heal without manual intervention.

Quota counters#

Quota accounting lives in Redis under quota:bucket:{kind}:{id}:{resource} keys, one counter per (bucket, resource) pair. A bunker's lifecycle hooks increment the counters on create and decrement them on terminate, across every bucket the bunker contributes to (user-individual, group-shared, group-per-user — all evaluated at create time). The decrement set is stashed in quota:bunker:{bunker_id} so the right counters are touched even if the bunker's profile assignment changed mid-life.

The same QuotaResolver evaluates every applicable profile on create; the strictest cap wins. Callers see GET /quota-profiles/usage for a per-bucket breakdown — current, cap, remaining — without having to sum active bunkers client-side.

If Redis is wiped, counters rebuild from the active-bunker rows on the next create attempt; the recovery is implicit, not a separate migration step.