Deployment
How to deploy ScaiGrid — managed, self-hosted, or hybrid. For the architecture overview, see Architecture.
Options#
- Managed. ScaiLabs hosts ScaiGrid for you at
scaigrid.scailabs.ai. You sign up, get a tenant, start making API calls. No operational burden. - Self-hosted. Run your own ScaiGrid instance on your infrastructure. Full control, more responsibility.
- Hybrid. Use the managed gateway for most inference but deploy your own ScaiInfer / ScaiBunker / ScaiMind workers to keep sensitive workloads in your infrastructure.
Components to deploy#
A complete ScaiGrid deployment needs:
- ScaiGrid process — 4 runtime modes (HTTP, gRPC, worker, migrate)
- MariaDB Galera — cluster of 3+ nodes recommended for HA
- Redis — single node is fine; HA via Sentinel or Redis Cluster for production
- S3-compatible storage — Garage (self-hosted), MinIO, or AWS S3
- ScaiKey — identity provider for authentication
- Nginx — TLS termination and reverse proxy
Optional depending on modules:
- Weaviate — for ScaiMatrix vector search
- Neo4j — for ScaiMatrix graph knowledge
- ScaiInfer cluster — for self-hosted LLM inference
- ScaiBunker workers — for sandboxed code execution
- ScaiMind cluster — for training workloads
Runtime modes#
The ScaiGrid container image supports five modes via the SCAIGRID_MODE env var:
| Mode | Purpose |
|---|---|
migrate |
Runs Alembic migrations (core + all enabled modules), exits |
http |
FastAPI HTTP server on port 8000 — main API |
grpc |
gRPC server on port 50051 — internal integrations |
worker |
ARQ background worker — cron tasks from core and modules |
bunker_proxy |
Stripped-down ScaiBunker storage proxy on port 8001 — DB-free, horizontally scalable behind a load balancer |
Production deployments run http, grpc, and worker as separate container replicas. migrate is a one-shot init container or manual step before rolling out a new version.
bunker_proxy is a specialized mode for deployments that want to put ScaiBunker workers behind a centralized storage proxy (rather than letting workers talk directly to S3). It loads only the storage routes — no DB, no module registry, no auth on the workers. Deploy multiple replicas behind an LB if you need horizontal scaling for image-bytes throughput. See ScaiBunker → Image workflow for when this matters; today's worker is a direct S3 client by default, so bunker_proxy is opt-in infrastructure.
Minimal docker-compose#
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | |
Plus your MariaDB, Redis, S3, ScaiKey. For a complete stack, see docker-compose.yml in the ScaiGrid source tree.
Essential env vars#
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
For local development, SCAIGRID_AUTH_DISABLED=true bypasses auth entirely — never set this in production.
Full env var reference in the ScaiGrid source app/config.py.
Nginx in front#
Terminate TLS and route. Core locations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | |
proxy_read_timeout 660s is critical — SSE streams can last longer than the default 60 seconds, and cutting them short produces mysterious client hangs. Must be greater than ScaiGrid's DISPATCH_STREAM_TIMEOUT_S (default 600).
HTTPS / TLS#
All production traffic should be HTTPS. Use Let's Encrypt via certbot, or your corporate PKI. ScaiGrid itself speaks HTTP — TLS termination is nginx's job.
Scaling#
- HTTP replicas. Horizontally scale
scaigrid-httpbehind a load balancer. Stateless; no sticky sessions needed. - Worker replicas. ARQ workers coordinate via Redis; scale out for throughput.
- gRPC replicas. Scale with HTTP proportionally if you use internal gRPC-heavy integrations.
- Database. MariaDB Galera cluster for HA. Size based on query load — most workloads are Redis-bound not DB-bound.
- Redis. A single well-provisioned Redis handles surprisingly high throughput. Move to Cluster mode only if you saturate a single instance (typically > 100k ops/sec).
Upgrading#
- Build/pull the new image.
- Run
scaigrid-migratecontainer with the new image — applies any new migrations. - Rolling replace
http,grpc,workerreplicas. - Watch logs for module initialization errors.
ScaiGrid is designed so old HTTP replicas can serve traffic while new replicas are rolling. Migrations are designed to be backwards-compatible within a minor version.
Module migration#
Each module has its own Alembic migration stream (table alembic_version_mod_{module_id}). The migrate mode runs core migrations first, then each enabled module's migrations. Failures in one module don't block other modules from migrating.