---
summary: How the ScaiGrid controller, ScaiBunker workers, the storage proxy, and the
  Firecracker microVMs fit together.
title: Architecture
path: concepts/architecture
status: published
---

ScaiBunker is a thin product layer over ScaiGrid's existing primitives — tenancy, quotas, audit, S3-backed storage — wrapped around a fleet of worker nodes that run Firecracker microVMs. The controller never executes code itself; it schedules work onto workers and relays bytes between workers and object storage.

## Components

Three tiers: the ScaiGrid controller (FastAPI process, owns the database and Redis), the worker fleet (separate hosts that run Firecracker microVMs), and S3 (snapshots, images, audit batches, file staging). Workers connect outward to the controller; they never accept inbound connections from the caller.

```mermaid
flowchart LR
    C[Caller]
    CTRL["Controller<br/>Bunker svc<br/>Quota svc<br/>Scheduler<br/>Image svc<br/>Snapshot svc"]
    SP["Storage proxy<br/>/storage/..."]
    S3[Garage S3]
    WD[scaibunker-worker daemon]
    FC["Firecracker microVM<br/>(ext4)"]
    S3C[S3 client]

    C -- "/v1/modules/scaibunker/..." --> CTRL
    CTRL -- "bunker, exec results, files" --> C
    CTRL -- HTTP --> WD
    WD --> CTRL
    WD --- FC
    WD --- S3C
    SP <-- stream --> S3C
    CTRL --> SP
    SP -- S3 PUT --> S3
    S3C --> S3

    subgraph SG [ScaiGrid]
        CTRL
        SP
        S3
    end

    subgraph WF [Worker fleet]
        WD
        FC
        S3C
    end
```

There's no separate ScaiBunker deployment of the controller — it runs as a ScaiGrid module in the main FastAPI process, behind the same auth, charged against the same accounting. Workers are separate hosts; they connect outward to the controller.

The storage proxy can additionally run as a stripped-down `SCAIGRID_MODE=bunker_proxy` process: same `/v1/modules/scaibunker/storage/*` routes, no DB or Redis, horizontally scalable behind a load balancer. Workers never see S3 credentials.

## Request flow for one `exec` call

1. **Caller** sends `POST /v1/modules/scaibunker/bunkers/{id}/exec` with a command, timeout, optional env and stdin.
2. **Auth** validates the bearer token and confirms the caller has `scaibunker:execute`.
3. **Bunker lookup** confirms the bunker is in this tenant and in `running` state; otherwise 404 or 409.
4. **Worker pool** picks the worker the bunker is scheduled on and opens an HTTP connection.
5. **Execution.** The controller forwards the exec request to the worker. The worker runs the command inside the microVM, captures stdout / stderr / exit code, enforces the timeout.
6. **Output handling.** Small outputs come back inline; large outputs are streamed to S3 via the storage proxy and the worker returns a `full_output_ref` instead.
7. **Exec log.** A row in `mod_scaibunker_exec_log` records the command, exit code, duration, output preview.
8. **Activity timestamp.** The bunker's `last_activity_at` is bumped so the idle-timeout sweeper leaves it alone.
9. **Response.** Inline mode returns the full result; `stream: true` returns Server-Sent Events with `stdout`, `stderr`, and `exit` events.

## State

- **Bunkers, workers, images, snapshots, quota profiles, availability groups, bridges, exec logs** — in ScaiGrid's MariaDB.
- **Live status, quota counters, worker heartbeats** — in Redis. The Redis state is rebuildable from MariaDB plus worker heartbeats; losing Redis is recoverable.
- **Ext4 rootfs images, snapshots, large exec outputs, audit batches, file staging** — in S3 via the storage proxy.
- **Per-worker image cache** — local on each worker; reconciled with the controller's `mod_scaibunker_image_cache` ledger every heartbeat.

## Scheduler and image fan-out

When a bunker create lands, the scheduler picks a worker that (a) has the requested image cached `ready` (unless the image is `lazy_pull: true`), (b) has free capacity for the requested CPU / memory / disk / GPU, and (c) for transit bunkers, owns every bridge the interfaces reference. If no worker fits, the call fails with `WORKER_UNAVAILABLE` or `NO_SUITABLE_WORKER`.

Availability groups scope the fan-out: adding an image to a group warms it on every worker in the group; adding a worker to a group warms every image already in the group. Workers that share a group can also fetch ext4 bytes from each other via P2P discovery (`GET /peers/{sha256}`), falling back to the controller storage proxy only when no peer has the sha cached yet.

## Lifecycle state machine

Every bunker moves through a fixed set of states:

`pending` → `provisioning` → `running` ⇄ `paused` / `snapshotting` → `terminated`

Failure can intrude from any non-terminal state and lands the bunker in `failed`. `terminated` and `failed` are terminal — bunkers don't come back. The transitions are enforced controller-side; an illegal move (e.g. resume on terminated) returns `BUNKER_INVALID_TRANSITION` (409).

A sweeper background task runs every 30 seconds and terminates bunkers that have exceeded `max_lifetime_s` or sat idle longer than `idle_timeout_s`. Defaults are lifecycle-dependent: ephemeral bunkers default to 1 hour max / 5 minutes idle; session bunkers to 8 hours / 15 minutes; persistent bunkers to no hard cap on lifetime but 15 minutes idle (the scheduler passivates them via snapshot).

## Trust boundaries

The worker token (`SCAIBUNKER_WORKER_TOKEN`) is the only credential workers carry. They use it for heartbeats, storage-proxy reads and writes, and peer discovery. The token never crosses the bunker boundary — code running inside a microVM has no way to reach it.

Bunker → outside-world traffic is governed by the bunker's `network_profile`. Even on `unrestricted`, all egress goes through the worker's network namespace and is optionally NDJSON-audited. Bunker → controller (the agent backchannel) is the only path the bunker has to call back into ScaiGrid; it is rate-limited and tenant-scoped.

Caller → ScaiBunker is the normal ScaiGrid bearer flow — JWT or API key, with `scaibunker:*` module permissions evaluated per request. The storage proxy uses either the shared worker bearer (workers) or short-lived capability tokens (admin clients that prefer not to hand workers more privilege than they need).

## How it differs from `docker run` on your own host

ScaiBunker is multi-tenant, quota-aware, auditable, and hardware-isolated by default. A docker host gives you the runtime; ScaiBunker gives you the runtime plus the surrounding machinery a multi-tenant product needs.

| Concern | DIY container host | ScaiBunker |
|---|---|---|
| Isolation | Linux namespaces + seccomp | Firecracker hardware virtualisation |
| Multi-tenancy | You design it | Built-in (tenant column, quota profiles, audit) |
| Image distribution | You orchestrate | Availability groups + P2P peer fetch |
| Network policy | iptables rules you write | Five named profiles, per-flow audit |
| Snapshots | You wire up | Built-in, S3-backed |
| Quotas | You build it | Per-user / per-group, composable, Redis-backed |
| Audit | You instrument | Every exec / file op / shell logged |

For a one-off CI container or a developer laptop, you don't need ScaiBunker. For multi-tenant sandboxed compute under quotas with auditing, the surface area saved is most of the work.

## Background tasks

The controller runs six cron-style background tasks per minute via arq:

- **Sweep timeouts** (every 30s) — terminates bunkers past their idle or lifetime caps.
- **Monitor heartbeats** (every 30s) — flips workers to `offline` when their heartbeat goes stale.
- **Cleanup snapshots** (every 5 min) — deletes snapshots past their `expires_at`.
- **Emit usage ticks** (every minute) — feeds the accounting pipeline.
- **Scan pending images** (every 2 min) — drains the image-scan queue.
- **Request periodic rescans** (daily at 03:17 UTC) — flips aged scan results to pending so the scanner re-checks for new CVEs in unchanged images.

These tasks run on the same worker pool that backs the rest of ScaiGrid's background work — no separate process to deploy.

## Heartbeats

Workers POST `/v1/modules/scaibunker/heartbeat` every ten seconds (`WORKER_HEARTBEAT_INTERVAL_S`). Each heartbeat carries the worker's current capacity (available CPU / memory / disk / GPU), active bunker count and per-bunker statuses, and optionally a `cached_images` list. If three intervals pass without a heartbeat, the monitor flips the worker to `offline` and the scheduler stops handing it new bunkers; active bunkers keep running until the heartbeat returns or the worker is drained.

The optional `cached_images` list is the worker's authoritative view of its on-disk image cache. When present, the controller reconciles `mod_scaibunker_image_cache` against it — marking entries the worker has but the controller lost (drift recovery after a DB issue) and evicting rows the worker no longer reports. This is what lets you wipe a worker's `/var/lib/scaibunker` and have the controller's view self-heal without manual intervention.

## Quota counters

Quota accounting lives in Redis under `quota:bucket:{kind}:{id}:{resource}` keys, one counter per (bucket, resource) pair. A bunker's lifecycle hooks increment the counters on create and decrement them on terminate, across every bucket the bunker contributes to (user-individual, group-shared, group-per-user — all evaluated at create time). The decrement set is stashed in `quota:bunker:{bunker_id}` so the right counters are touched even if the bunker's profile assignment changed mid-life.

The same `QuotaResolver` evaluates every applicable profile on create; the strictest cap wins. Callers see `GET /quota-profiles/usage` for a per-bucket breakdown — current, cap, remaining — without having to sum active bunkers client-side.

If Redis is wiped, counters rebuild from the active-bunker rows on the next create attempt; the recovery is implicit, not a separate migration step.
