---
title: Architecture
path: introduction/architecture
status: published
---

# Architecture

ScaiGrid is a FastAPI application with a pluggable module system. A request enters through middleware, gets routed to a handler, authenticates against ScaiKey, consults the routing policy, dispatches to an upstream provider, records accounting, and returns a normalized response.

## High-level view

```mermaid
flowchart TB
    App[Your app]

    subgraph ScaiGrid[ScaiGrid]
        MW[Middleware<br/>request-id<br/>rate-limit<br/>drain-check<br/>audit<br/>CORS]
        Auth[Auth<br/>JWT / API key]
        Routers[Routers<br/>/v1/*<br/>/oai/v1/*<br/>/v1/modules/&#123;id&#125;/*]
        Services[Services<br/>routing<br/>inference<br/>accounting<br/>dispatch]
        Dispatchers[Dispatchers<br/>OpenAI / Anthropic /<br/>Azure / Mistral /<br/>Qwen / Google /<br/>ScaiInfer / Custom]
        MW --> Auth --> Routers --> Services --> Dispatchers
    end

    Upstream[Upstream Providers]

    App -- HTTP --> MW
    Dispatchers --> Upstream
```

## Request flow

For a chat completion:

1. **Middleware** assigns a request ID, checks the drain flag (refuses new requests during graceful shutdown), applies rate limits (per-key, per-user, per-tenant, per-partner), logs start time.
2. **Authentication** resolves the caller. Either a ScaiKey-issued JWT (validated via JWKS from ScaiKey), an API key (`sgk_` prefix, hashed in the DB), or a service token (for internal components). The resolved user carries their tenant, partner, roles, and permissions.
3. **Permission guard** checks the user can use this endpoint — for inference that's `models:use`, for admin endpoints that's `admin:access` or higher.
4. **Routing service** resolves the frontend model slug the caller asked for, picks a backend model via the model's routing policy (weighted, priority-based with failover), and returns a (model, backend) tuple.
5. **Accounting pre-check** consults the budget enforcer. If the caller is over budget, the request fails fast with `BUDGET_EXCEEDED` before we spend any upstream tokens.
6. **System prompt rewriting** applies persona templates or safety prefixes if the model is configured with one.
7. **Dispatcher** sends the request to the upstream using the right API shape for that provider. Each provider has a dispatcher implementation in `app/dispatch/`.
8. **Response normalization** converts the upstream response into ScaiGrid's provider-agnostic shape. Content blocks are flattened, tool-call arguments coerced, everything ends up as plain strings or typed objects the client expects.
9. **Accounting commit** records token counts, latency, and cost in Redis (fast, buffered) and later flushes to MariaDB.
10. **Response** goes back to the client with the request ID in the header and a structured JSON envelope.

For streaming completions, steps 7–9 are interleaved: the dispatcher yields chunks as they arrive, accounting counts them incrementally, and the gateway forwards them to the client as SSE.

## Core services

| Service | Responsibility |
|---------|----------------|
| **RoutingService** | Resolves frontend model → backend, applies routing policy |
| **InferenceService** | Orchestrates system prompt rewriting, dispatch, streaming, accounting |
| **DispatchService** | Wraps provider dispatchers with circuit breaker and error normalization |
| **AccountingEngine** | Budget enforcement, usage counters, streaming reservations, settlement |
| **AuditService** | Records mutating requests to the audit log |
| **ModuleRegistry** | Discovers, initializes, and mounts modules at startup |
| **EventBus** | Redis Streams-backed pub/sub for internal events |

## Storage

| Store | Used for |
|-------|----------|
| **MariaDB (Galera)** | Persistent state: users, models, tenants, budgets, usage records, audit log, module state |
| **Redis** | Real-time state: rate-limit counters, streaming reservations, session caches, event bus, module runtime state |
| **S3 (Garage)** | Blobs: avatars, uploaded documents, inference artifacts, exported reports, snapshots |

All three are external dependencies ScaiGrid connects to — it doesn't embed a database.

## Module system

Modules live under `modules/{module_id}/` in the ScaiGrid codebase (or ship as separate packages). At startup, ScaiGrid scans the module path, instantiates each module, resolves dependencies in topological order, calls `initialize()`, and mounts any routes, admin pages, background tasks, or MCP tools the module declares.

Each module gets:

- A URL namespace at `/v1/modules/{module_id}/`
- A set of permissions registered in ScaiKey (e.g., `scaibot:manage`)
- Its own Alembic migration stream (version table `alembic_version_mod_{module_id}`)
- A slot in the admin UI sidebar
- Access to the shared `ModuleContext` (DB engine, Redis, config, event bus, HTTP client, logger)

See [Modules](/docs/scaigrid/modules) for how to enable and configure them.

## Runtime modes

ScaiGrid can run in four modes, selected by the `SCAIGRID_MODE` environment variable:

- **`http`** — FastAPI HTTP server (default). Serves `/v1/`, `/oai/v1/`, `/metrics`, `/health`.
- **`grpc`** — gRPC server for internal components (ScaiInfer heartbeats, ScaiMind training, etc.).
- **`worker`** — ARQ background worker. Runs cron tasks declared by modules and core.
- **`migrate`** — Runs Alembic migrations (core + all modules), then exits.

Production deployments typically run all four as separate containers off the same image.

## What's next

- [Quickstart](../02-getting-started/01-quickstart.md) — first API call.
- [Models and Routing](../03-core-concepts/03-models-and-routing.md) — the frontend/backend separation in detail.
- [Modules](/docs/scaigrid/modules) — what modules are and how to use them.