Deployment

How to deploy ScaiGrid — managed, self-hosted, or hybrid. For the architecture overview, see Architecture.

Options#

Managed. ScaiLabs hosts ScaiGrid for you at scaigrid.scailabs.ai. You sign up, get a tenant, start making API calls. No operational burden.
Self-hosted. Run your own ScaiGrid instance on your infrastructure. Full control, more responsibility.
Hybrid. Use the managed gateway for most inference but deploy your own ScaiInfer / ScaiBunker / ScaiMind workers to keep sensitive workloads in your infrastructure.

Components to deploy#

A complete ScaiGrid deployment needs:

ScaiGrid process — 4 runtime modes (HTTP, gRPC, worker, migrate)
MariaDB Galera — cluster of 3+ nodes recommended for HA
Redis — single node is fine; HA via Sentinel or Redis Cluster for production
S3-compatible storage — Garage (self-hosted), MinIO, or AWS S3
ScaiKey — identity provider for authentication
Nginx — TLS termination and reverse proxy

Optional depending on modules:

Weaviate — for ScaiMatrix vector search
Neo4j — for ScaiMatrix graph knowledge
ScaiInfer cluster — for self-hosted LLM inference
ScaiBunker workers — for sandboxed code execution
ScaiMind cluster — for training workloads

Runtime modes#

The ScaiGrid container image supports five modes via the SCAIGRID_MODE env var:

Mode	Purpose
`migrate`	Runs Alembic migrations (core + all enabled modules), exits
`http`	FastAPI HTTP server on port 8000 — main API
`grpc`	gRPC server on port 50051 — internal integrations
`worker`	ARQ background worker — cron tasks from core and modules
`bunker_proxy`	Stripped-down ScaiBunker storage proxy on port 8001 — DB-free, horizontally scalable behind a load balancer

Production deployments run http, grpc, and worker as separate container replicas. migrate is a one-shot init container or manual step before rolling out a new version.

bunker_proxy is a specialized mode for deployments that want to put ScaiBunker workers behind a centralized storage proxy (rather than letting workers talk directly to S3). It loads only the storage routes — no DB, no module registry, no auth on the workers. Deploy multiple replicas behind an LB if you need horizontal scaling for image-bytes throughput. See ScaiBunker → Image workflow for when this matters; today's worker is a direct S3 client by default, so bunker_proxy is opt-in infrastructure.

Minimal docker-compose#

yaml
services:
  scaigrid-migrate:
    image: registry.scailabs.ai/scaigrid:latest
    environment:
      SCAIGRID_MODE: migrate
    env_file: .env
    depends_on:
      mariadb: {condition: service_healthy}

  scaigrid-http:
    image: registry.scailabs.ai/scaigrid:latest
    environment:
      SCAIGRID_MODE: http
    env_file: .env
    ports: ["8000:8000"]
    depends_on:
      scaigrid-migrate: {condition: service_completed_successfully}

  scaigrid-grpc:
    image: registry.scailabs.ai/scaigrid:latest
    environment:
      SCAIGRID_MODE: grpc
    env_file: .env
    ports: ["50051:50051"]

  scaigrid-worker:
    image: registry.scailabs.ai/scaigrid:latest
    environment:
      SCAIGRID_MODE: worker
    env_file: .env

Plus your MariaDB, Redis, S3, ScaiKey. For a complete stack, see docker-compose.yml in the ScaiGrid source tree.

Essential env vars#

tsql
# Database
DATABASE_URL=mysql+asyncmy://scaigrid:CHANGE_ME@mariadb:3306/scaigrid

# Redis
REDIS_URL=redis://redis:6379/0

# S3 / object storage
S3_ENDPOINT_URL=http://garage:3900
S3_ACCESS_KEY_ID=...
S3_SECRET_ACCESS_KEY=...
S3_BUCKET=scaigrid

# ScaiKey identity provider
SCAIKEY_BASE_URL=https://scaikey.scailabs.ai
SCAIKEY_JWKS_URL=https://scaikey.scailabs.ai/api/v1/platform/.well-known/jwks.json
SCAIKEY_AUDIENCE=user-portal
SCAIKEY_CLIENT_ID=...
SCAIKEY_CLIENT_SECRET=...
SCAIKEY_WEBHOOK_SECRET=...

# JWT (for admin UI sessions)
JWT_SIGNING_SECRET=<random 64 chars>

# Defaults
APP_ENV=production
APP_LOG_LEVEL=info

For local development, SCAIGRID_AUTH_DISABLED=true bypasses auth entirely — never set this in production.

Full env var reference in the ScaiGrid source app/config.py.

Nginx in front#

Terminate TLS and route. Core locations:

nginx
upstream scaigrid_backend {
    server scaigrid:8000;
    keepalive 32;
}

server {
    listen 443 ssl http2;
    server_name scaigrid.scailabs.ai;

    ssl_certificate /etc/letsencrypt/live/scaigrid.scailabs.ai/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/scaigrid.scailabs.ai/privkey.pem;

    location /v1/ {
        proxy_pass http://scaigrid_backend;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_buffering off;
        proxy_read_timeout 660s;  # must exceed dispatch_stream_timeout_s (600s)
    }

    location /oai/ {
        proxy_pass http://scaigrid_backend;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_buffering off;
        proxy_read_timeout 660s;
    }

    location /health {
        proxy_pass http://scaigrid_backend;
    }
}

proxy_read_timeout 660s is critical — SSE streams can last longer than the default 60 seconds, and cutting them short produces mysterious client hangs. Must be greater than ScaiGrid's DISPATCH_STREAM_TIMEOUT_S (default 600).

HTTPS / TLS#

All production traffic should be HTTPS. Use Let's Encrypt via certbot, or your corporate PKI. ScaiGrid itself speaks HTTP — TLS termination is nginx's job.

Scaling#

HTTP replicas. Horizontally scale scaigrid-http behind a load balancer. Stateless; no sticky sessions needed.
Worker replicas. ARQ workers coordinate via Redis; scale out for throughput.
gRPC replicas. Scale with HTTP proportionally if you use internal gRPC-heavy integrations.
Database. MariaDB Galera cluster for HA. Size based on query load — most workloads are Redis-bound not DB-bound.
Redis. A single well-provisioned Redis handles surprisingly high throughput. Move to Cluster mode only if you saturate a single instance (typically > 100k ops/sec).

Upgrading#

Build/pull the new image.
Run scaigrid-migrate container with the new image — applies any new migrations.
Rolling replace http, grpc, worker replicas.
Watch logs for module initialization errors.

ScaiGrid is designed so old HTTP replicas can serve traffic while new replicas are rolling. Migrations are designed to be backwards-compatible within a minor version.

Module migration#

Each module has its own Alembic migration stream (table alembic_version_mod_{module_id}). The migrate mode runs core migrations first, then each enabled module's migrations. Failures in one module don't block other modules from migrating.