ScaiBunker
ScaiBunker is the sandboxed-compute product on top of ScaiGrid. You provision a bunker — an isolated Linux environment with its own rootfs, network namespace, and resource limits — then drive it through a structured API: run commands, read and write files, open an interactive shell, take snapshots. The hard isolation is Firecracker; the rest is ScaiGrid's tenancy, quota, and audit machinery wrapped around it.
It is built on top of ScaiGrid's identity, accounting, and module systems, so every bunker is owned by a tenant, charged against a quota profile, and audited the same way the rest of the platform is.
When to use it#
- You have an AI agent that needs to actually execute code, not just describe it.
- You want a per-conversation Python or shell environment that survives across turns.
- You need a place to run untrusted user-supplied code with hard isolation and a network policy.
- You want to give a model file I/O and
pip installwithout giving it your laptop. - You need to chain L2 network appliances (firewall, IDS, NAT) inside a tenant's network without exposing them to the rest of the platform.
- You run a multi-tenant SaaS that needs sandboxed compute as a first-class product feature, complete with quotas and audit.
If you only need a one-shot text completion, you don't need ScaiBunker — call /v1/inference/chat directly. If you need long-running headless agents that share state and tools with ScaiGrid, look at ScaiCore (which itself uses ScaiBunker under the hood).
What you get#
- Firecracker microVMs. Each bunker is its own microVM with an ext4 rootfs and a Linux network namespace. Hardware-virtualised isolation, sub-second cold starts.
- Three lifecycle modes.
ephemeralfor one-shot,sessionfor conversation-scoped,persistentfor always-on agents. - Structured execution API.
POST /execfor commands,PUT/GET/DELETE /filesfor file I/O, WebSocket for interactive PTY. - Five network profiles. From fully isolated to unrestricted, with a registry-only tier, a tenant-configured allowlist, and an L2 transit profile for in-tenant network appliances.
- Quota profiles. Per-user and per-group caps that compose; most-restrictive wins.
- Image fan-out. Availability groups scope which workers pre-bake which images so first-bunker latency stays low.
- Snapshots. Filesystem archives in S3; restore on a different worker.
- Egress audit. Per-flow NDJSON batches for the
unrestrictedprofile, retrievable through the same API. - Trivy scans. Every registered image is scanned automatically; results refreshed daily.
Two-minute mental model#
You manage four nouns and one verb:
- A Bunker is one sandboxed environment owned by a tenant.
- A Worker is a node that runs bunkers. Workers register themselves via heartbeat; admins drain and resume them.
- An Image is a Firecracker rootfs baked from an OCI image or tar source.
- A Quota profile is a reusable bundle of resource caps assigned to users or groups.
- And the verb: you exec — send a command, read or write a file, open a shell.
Everything else (snapshots, audit batches, bridges, availability groups, the P2P image fetch, the storage proxy) is plumbing around those four nouns.
How it sits in ScaiGrid#
ScaiBunker is a ScaiGrid module, not a separate service. It runs in the same FastAPI process, behind the same auth, charged against the same accounting pipeline. Workers are separate hosts that connect outward to the controller; they never need inbound reachability.
For high-throughput deployments where the bytes-per-second through the storage proxy (snapshots, images, audit batches, exec output) starts to dominate the main controller, you can split the storage proxy into its own process: SCAIGRID_MODE=bunker_proxy runs a stripped-down FastAPI app that serves only the /v1/modules/scaibunker/storage/* routes, with no DB or Redis dependencies. Stateless, horizontally scalable behind a load balancer.
Built-in images#
Six platform-managed base images ship by default: python-3.12, python-3.12-ml (CUDA + PyTorch), nodejs-22, ubuntu-24.04, rust-latest, and minimal (Alpine). You can register your own with POST /v1/modules/scaibunker/images — OCI sources from any registry, or tarballs your workers can read locally.
Every registered image is automatically scanned for CVEs by Trivy; results are surfaced on the image row and refreshed daily. Tenant-scoped images stay invisible to other tenants; platform-scope and partner-scope variants exist for fleet-wide and reseller-managed catalogues.
Recipes#
- Data analysis sandbox. Ephemeral
python-3.12-mlbunker, upload a CSV, run pandas, download the result. - Agent with code execution. Session bunker bound to a ScaiCore conversation; the LLM generates code, the bunker runs it, output goes back to the LLM for iteration.
- Batch PDF processing. Loop of ephemeral bunkers on the
registryprofile, each processing one document. - Tenant network chain. Transit bunkers running pfSense / Suricata over tenant-scoped bridges in front of application bunkers.
Permissions at a glance#
ScaiBunker has its own granular permission keys so you can grant exactly the network posture and lifecycle modes each operator should have. scaibunker:create is the minimum to provision an ephemeral isolated bunker; scaibunker:execute adds command execution; each network profile beyond isolated is its own permission. See Permissions for the full list and default role mapping.
Tenant admins implicitly hold scaibunker:admin:tenant, so they can stand up tenant-scoped quota profiles, availability groups, and bridges without needing a super-admin round-trip.
Where to go next#
- Quickstart — provision a bunker, run a command, read a file, terminate it.
- Architecture — how the controller, workers, storage proxy, and bunker microVMs fit together.
- Network profiles — the five profiles, what they allow, who can use them.
- API reference — every endpoint, request, response.
- Build a data analysis sandbox — full walkthrough.
- Register a custom image — bake your own ext4 from an OCI source.
ScaiBunker's module ID inside ScaiGrid is scaibunker; its API is mounted at /v1/modules/scaibunker/.