Deploy ScaiWave

json
{
  "title": "Deploy ScaiWave",
  "audience": "developer",
  "summary": "Production deployment topology, environment, and operational checklist.",
  "sort_order": 7
}

Deploy ScaiWave#

ScaiWave is built to deploy as a small set of services backed by standard infrastructure. The architecture is intentionally boring: nothing exotic, everything horizontally scalable.

What you run#

Process	What it does	Scale
`scaiwave-api`	FastAPI / uvicorn	Horizontal; behind a load balancer.
`scaiwave-worker`	ARQ worker for background jobs	Horizontal; pick worker count by CPU.
`scaiwave-scheduler`	ARQ cron scheduler	Exactly one instance. Singleton.

The client (SolidJS + Tauri) ships separately — desktop installers, mobile app stores, or a static web bundle behind a CDN.

What you depend on#

Dependency	Purpose	Notes
MariaDB 11+	Primary store	Or any MySQL-compatible engine; tested against MariaDB.
Redis 7+	Cache, ARQ queue, presence	Persistent (AOF or RDB).
NATS JetStream 2.10+	Event fan-out	Persistent stream for room events.
MinIO / S3	Media storage	Bucket per tenant or one shared (use `tenant_id` prefix).
Weaviate 1.27+	Search index	No server-side vectorizer needed (we vectorize via ScaiGrid).
ClamAV 1.4+	Media virus-scan	Optional; recommended in prod.
LiveKit	Calls	Self-hosted or LiveKit Cloud.
ScaiKey	OIDC identity	Per-tenant config.
ScaiGrid	Inference	For models, embeddings, transcription.

Environment#

Every config is via SCAIWAVE_<NAME> env vars. The essential set:

bash
SCAIWAVE_DB_URL=mysql+aiomysql://user:pass@db-host/scaiwave
SCAIWAVE_REDIS_URL=redis://redis-host:6379/0
SCAIWAVE_NATS_URL=nats://nats-host:4222
SCAIWAVE_S3_ENDPOINT=https://minio.example.com
SCAIWAVE_S3_BUCKET=scaiwave-media
SCAIWAVE_S3_ACCESS_KEY=...
SCAIWAVE_S3_SECRET_KEY=...
SCAIWAVE_WEAVIATE_URL=http://weaviate-host:8080
SCAIWAVE_CLAMAV_URL=tcp://clamav-host:3310
SCAIWAVE_LIVEKIT_URL=wss://livekit.example.com
SCAIWAVE_LIVEKIT_API_KEY=...
SCAIWAVE_LIVEKIT_API_SECRET=...

SCAIWAVE_AUTH_MODE=scaikey
SCAIWAVE_SCAIKEY_URL=https://scaikey.example.com
SCAIWAVE_SCAIKEY_CLIENT_ID=scaiwave-prod
SCAIWAVE_SCAIKEY_CLIENT_SECRET=...

SCAIWAVE_SCAIGRID_URL=https://scaigrid.example.com
SCAIWAVE_SCAIGRID_API_KEY=...   # fallback only — prefer per-user exchange
SCAIWAVE_AI_EMBEDDING_MODEL=scailar-4

SCAIWAVE_SERVER_NAME=scaiwave.example.com
SCAIWAVE_DEBUG=false

Full list in Reference: Configuration.

Topology#

flowchart TD Browser[Browser] LB[Load balancer TLS termination sticky WS] API1[API pod] APIN[API pod] W1[Worker pod] WN[Worker pod] Sched[Scheduler singleton] subgraph Backend [Backend services] direction LR DB[(MariaDB)] Redis[(Redis)] NATS[(NATS JetStream)] Weaviate[(Weaviate)] MinIO[(MinIO / S3)] ClamAV[ClamAV] LiveKit[LiveKit] end Browser -- HTTPS / WSS --> LB LB --> API1 LB --> APIN API1 --> Backend APIN --> Backend W1 --> Backend WN --> Backend Sched --> Backend

The API pods serve both HTTP and WebSocket. Load balancer should have:

TLS termination.
Sticky sessions for WS (or accept that occasional disconnects drive reconnects via /v1/sync).
Long-poll friendly timeouts (the sync endpoint blocks up to 30s by default).

Migration on deploy#

bash
scaiwave-api .venv/bin/alembic upgrade head

Migrations are forward-only and additive. Roll out the new API pods after migrations succeed; old pods continue to work against the new schema until they restart.

Health and observability#

GET /health — readiness probe. Returns 200 when DB+Redis+NATS are reachable.
GET /metrics — Prometheus scrape endpoint. Counters and histograms for request volume, latency, error rate, AI token usage, ARQ queue depth.

Structured logs via structlog. Configure log shipping to your usual sink (ELK, Loki, Cloud Logging, …).

Capacity rules of thumb#

Component	Per	Limit
API pod	1 vCPU, 1 GB	~200 simultaneous WS, 300 req/s.
Worker pod	1 vCPU, 1 GB	~10 concurrent index jobs.
MariaDB	small instance	Handles thousands of rooms easily; scale read replicas if needed.
Redis	2 GB	Plenty for a tenant with low thousands of active users.
Weaviate	4 vCPU, 8 GB, SSD	For a tenant with 100K messages indexed.
MinIO	spinning disk	Match expected media volume; bandwidth often matters more than IOPS.

Hardening checklist#

HTTPS everywhere; HSTS on the load balancer.
WSS, not WS.
Bearer tokens in Authorization, not query strings (the WS endpoint is the only exception; document this for proxy logs).
CSP on the served HTML (no inline scripts allowed by default).
Trusted-proxy IP allowlist on the API.
Per-tenant rate limits configured (tenant.features.rate_limits).
Backups: DB nightly, Weaviate weekly, MinIO replicated.
ClamAV reachable from the worker pods.
LiveKit egress / recording bucket configured.

Operational tooling#

CLI commands the operator runs:

bash
scaiwave sync                 # Identity sync against ScaiKey.
scaiwave admin list-tenants   # All tenants and their limits.
scaiwave room export <id> ... # Bundle a room for compliance.
scaiwave room import <ws> ... # Restore from bundle.

See CLI reference for the full surface.

Where to go next#

Reference: Configuration — every env var.
Reference: CLI.
Enable federation with a peer.