Architecture

ScaiGrid is a FastAPI application with a pluggable module system. A request enters through middleware, gets routed to a handler, authenticates against ScaiKey, consults the routing policy, dispatches to an upstream provider, records accounting, and returns a normalized response.

High-level view#

flowchart TB App[Your app] subgraph ScaiGrid[ScaiGrid] MW[Middleware request-id rate-limit drain-check audit CORS] Auth[Auth JWT / API key] Routers[Routers /v1/* /oai/v1/* /v1/modules/{id}/*] Services[Services routing inference accounting dispatch] Dispatchers[Dispatchers OpenAI / Anthropic / Azure / Mistral / Qwen / Google / ScaiInfer / Custom] MW --> Auth --> Routers --> Services --> Dispatchers end Upstream[Upstream Providers] App -- HTTP --> MW Dispatchers --> Upstream

Request flow#

For a chat completion:

Middleware assigns a request ID, checks the drain flag (refuses new requests during graceful shutdown), applies rate limits (per-key, per-user, per-tenant, per-partner), logs start time.
Authentication resolves the caller. Either a ScaiKey-issued JWT (validated via JWKS from ScaiKey), an API key (sgk_ prefix, hashed in the DB), or a service token (for internal components). The resolved user carries their tenant, partner, roles, and permissions.
Permission guard checks the user can use this endpoint — for inference that's models:use, for admin endpoints that's admin:access or higher.
Routing service resolves the frontend model slug the caller asked for, picks a backend model via the model's routing policy (weighted, priority-based with failover), and returns a (model, backend) tuple.
Accounting pre-check consults the budget enforcer. If the caller is over budget, the request fails fast with BUDGET_EXCEEDED before we spend any upstream tokens.
System prompt rewriting applies persona templates or safety prefixes if the model is configured with one.
Dispatcher sends the request to the upstream using the right API shape for that provider. Each provider has a dispatcher implementation in app/dispatch/.
Response normalization converts the upstream response into ScaiGrid's provider-agnostic shape. Content blocks are flattened, tool-call arguments coerced, everything ends up as plain strings or typed objects the client expects.
Accounting commit records token counts, latency, and cost in Redis (fast, buffered) and later flushes to MariaDB.
Response goes back to the client with the request ID in the header and a structured JSON envelope.

For streaming completions, steps 7–9 are interleaved: the dispatcher yields chunks as they arrive, accounting counts them incrementally, and the gateway forwards them to the client as SSE.

Core services#

Service	Responsibility
RoutingService	Resolves frontend model → backend, applies routing policy
InferenceService	Orchestrates system prompt rewriting, dispatch, streaming, accounting
DispatchService	Wraps provider dispatchers with circuit breaker and error normalization
AccountingEngine	Budget enforcement, usage counters, streaming reservations, settlement
AuditService	Records mutating requests to the audit log
ModuleRegistry	Discovers, initializes, and mounts modules at startup
EventBus	Redis Streams-backed pub/sub for internal events

Storage#

Store	Used for
MariaDB (Galera)	Persistent state: users, models, tenants, budgets, usage records, audit log, module state
Redis	Real-time state: rate-limit counters, streaming reservations, session caches, event bus, module runtime state
S3 (Garage)	Blobs: avatars, uploaded documents, inference artifacts, exported reports, snapshots

All three are external dependencies ScaiGrid connects to — it doesn't embed a database.

Module system#

Modules live under modules/{module_id}/ in the ScaiGrid codebase (or ship as separate packages). At startup, ScaiGrid scans the module path, instantiates each module, resolves dependencies in topological order, calls initialize(), and mounts any routes, admin pages, background tasks, or MCP tools the module declares.

Each module gets:

A URL namespace at /v1/modules/{module_id}/
A set of permissions registered in ScaiKey (e.g., scaibot:manage)
Its own Alembic migration stream (version table alembic_version_mod_{module_id})
A slot in the admin UI sidebar
Access to the shared ModuleContext (DB engine, Redis, config, event bus, HTTP client, logger)

See Modules for how to enable and configure them.

Runtime modes#

ScaiGrid can run in four modes, selected by the SCAIGRID_MODE environment variable:

http — FastAPI HTTP server (default). Serves /v1/, /oai/v1/, /metrics, /health.
grpc — gRPC server for internal components (ScaiInfer heartbeats, ScaiMind training, etc.).
worker — ARQ background worker. Runs cron tasks declared by modules and core.
migrate — Runs Alembic migrations (core + all modules), then exits.

Production deployments typically run all four as separate containers off the same image.

What's next#

Quickstart — first API call.
Models and Routing — the frontend/backend separation in detail.
Modules — what modules are and how to use them.