Architecture
ScaiGrid is a FastAPI application with a pluggable module system. A request enters through middleware, gets routed to a handler, authenticates against ScaiKey, consults the routing policy, dispatches to an upstream provider, records accounting, and returns a normalized response.
High-level view#
Request flow#
For a chat completion:
- Middleware assigns a request ID, checks the drain flag (refuses new requests during graceful shutdown), applies rate limits (per-key, per-user, per-tenant, per-partner), logs start time.
- Authentication resolves the caller. Either a ScaiKey-issued JWT (validated via JWKS from ScaiKey), an API key (
sgk_prefix, hashed in the DB), or a service token (for internal components). The resolved user carries their tenant, partner, roles, and permissions. - Permission guard checks the user can use this endpoint — for inference that's
models:use, for admin endpoints that'sadmin:accessor higher. - Routing service resolves the frontend model slug the caller asked for, picks a backend model via the model's routing policy (weighted, priority-based with failover), and returns a (model, backend) tuple.
- Accounting pre-check consults the budget enforcer. If the caller is over budget, the request fails fast with
BUDGET_EXCEEDEDbefore we spend any upstream tokens. - System prompt rewriting applies persona templates or safety prefixes if the model is configured with one.
- Dispatcher sends the request to the upstream using the right API shape for that provider. Each provider has a dispatcher implementation in
app/dispatch/. - Response normalization converts the upstream response into ScaiGrid's provider-agnostic shape. Content blocks are flattened, tool-call arguments coerced, everything ends up as plain strings or typed objects the client expects.
- Accounting commit records token counts, latency, and cost in Redis (fast, buffered) and later flushes to MariaDB.
- Response goes back to the client with the request ID in the header and a structured JSON envelope.
For streaming completions, steps 7–9 are interleaved: the dispatcher yields chunks as they arrive, accounting counts them incrementally, and the gateway forwards them to the client as SSE.
Core services#
| Service | Responsibility |
|---|---|
| RoutingService | Resolves frontend model → backend, applies routing policy |
| InferenceService | Orchestrates system prompt rewriting, dispatch, streaming, accounting |
| DispatchService | Wraps provider dispatchers with circuit breaker and error normalization |
| AccountingEngine | Budget enforcement, usage counters, streaming reservations, settlement |
| AuditService | Records mutating requests to the audit log |
| ModuleRegistry | Discovers, initializes, and mounts modules at startup |
| EventBus | Redis Streams-backed pub/sub for internal events |
Storage#
| Store | Used for |
|---|---|
| MariaDB (Galera) | Persistent state: users, models, tenants, budgets, usage records, audit log, module state |
| Redis | Real-time state: rate-limit counters, streaming reservations, session caches, event bus, module runtime state |
| S3 (Garage) | Blobs: avatars, uploaded documents, inference artifacts, exported reports, snapshots |
All three are external dependencies ScaiGrid connects to — it doesn't embed a database.
Module system#
Modules live under modules/{module_id}/ in the ScaiGrid codebase (or ship as separate packages). At startup, ScaiGrid scans the module path, instantiates each module, resolves dependencies in topological order, calls initialize(), and mounts any routes, admin pages, background tasks, or MCP tools the module declares.
Each module gets:
- A URL namespace at
/v1/modules/{module_id}/ - A set of permissions registered in ScaiKey (e.g.,
scaibot:manage) - Its own Alembic migration stream (version table
alembic_version_mod_{module_id}) - A slot in the admin UI sidebar
- Access to the shared
ModuleContext(DB engine, Redis, config, event bus, HTTP client, logger)
See Modules for how to enable and configure them.
Runtime modes#
ScaiGrid can run in four modes, selected by the SCAIGRID_MODE environment variable:
http— FastAPI HTTP server (default). Serves/v1/,/oai/v1/,/metrics,/health.grpc— gRPC server for internal components (ScaiInfer heartbeats, ScaiMind training, etc.).worker— ARQ background worker. Runs cron tasks declared by modules and core.migrate— Runs Alembic migrations (core + all modules), then exits.
Production deployments typically run all four as separate containers off the same image.
What's next#
- Quickstart — first API call.
- Models and Routing — the frontend/backend separation in detail.
- Modules — what modules are and how to use them.