Deployment
Running ScaiDrive in production. Container-based with external dependencies for persistence.
What you need#
Compute:
- Three container roles off the same image:
api,worker,migrate. Runmigrateas a one-shot before startup; runapiandworkerlong-lived. - 2 CPU + 4 GB RAM per API replica as a starting point. Scale horizontally behind a load balancer. Workers are lighter — 1 CPU + 2 GB each.
External dependencies:
| Dep | Role | Notes |
|---|---|---|
| MariaDB 10.11+ / MySQL 8.0+ | Primary datastore | Galera cluster for HA |
| Redis 7+ | Cache, queue, WebSocket pub/sub | Single instance OK for small deployments; Sentinel or Cluster for HA |
| S3-compatible store | File chunks, blobs | Garage, MinIO, AWS S3, GCS via S3 compat |
| ScaiKey | Identity provider | Separate deployment |
| ScaiSend | Transactional email | For invitations and notifications |
| Weaviate | Vector store (optional) | Required for semantic search |
Docker Compose (dev / small prod)#
The repository ships a docker-compose.yml suitable for small deployments. Services:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | |
Run migrations once on deploy:
1 | |
Kubernetes#
The k8s/ directory in the repo has Helm charts and raw manifests. Key patterns:
- API Deployment —
replicas: 3behind a Service + Ingress. Stateless; any replica can serve any request. Probes on/api/v1/healthand/api/v1/ready. - Worker Deployment —
replicas: 2. Stateless; workers pull from Redis-backed ARQ queues. - Migration Job — runs
migrateonce per deploy, blocks until success. - PodDisruptionBudget — minAvailable 1 on API and worker for rolling updates.
- Redis — bitnami/redis or Redis Cluster for HA.
- MariaDB — bitnami/mariadb-galera or an external managed DB.
HorizontalPodAutoscaler#
Scale the API on request rate or CPU:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
Workers scale on queue depth. The worker image exposes a Prometheus counter; use a KEDA scaler to drive replica count from it.
Configuration#
All runtime configuration is environment-based. Required variables:
| Variable | Required | Notes |
|---|---|---|
SCAIDRIVE_DATABASE_URL |
Yes | Full async URL (mysql+asyncmy://...) |
SCAIDRIVE_REDIS_URL |
Yes | redis://... or rediss://... |
SCAIDRIVE_S3_ENDPOINT |
Yes | S3 / Garage / MinIO URL |
SCAIDRIVE_S3_BUCKET |
Yes | |
SCAIDRIVE_S3_ACCESS_KEY |
Yes | |
SCAIDRIVE_S3_SECRET_KEY |
Yes | |
SCAIDRIVE_SCAIKEY_URL |
Yes | Base URL of ScaiKey |
SCAIDRIVE_SCAIKEY_CLIENT_ID |
Yes | Registered OAuth client |
SCAIDRIVE_JWT_ISSUER |
Yes | Must match iss in ScaiKey tokens |
SCAIDRIVE_WEAVIATE_URL |
No | Required for semantic search |
SCAIDRIVE_SCAISEND_URL |
No | Required for email notifications |
SCAIDRIVE_SECRET_KEY |
Yes | 32+ bytes; used for internal encryption |
SCAIDRIVE_ENCRYPTION_KEY |
Yes | 32 bytes; encrypts stored connector credentials |
SCAIDRIVE_CORS_ORIGINS |
No | Comma-separated, default * in dev, unset in prod |
SCAIDRIVE_LOG_LEVEL |
No | DEBUG, INFO, WARNING, ERROR |
Full list: server/scaidrive/config/settings.py.
TLS#
ScaiDrive does not terminate TLS itself. Run behind a reverse proxy (nginx, Traefik, cloud load balancer) that terminates and forwards to http://scaidrive-api:8000.
WebSocket upgrades need the standard headers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
Raise proxy_read_timeout to at least 1 hour; the default 60s kills WebSockets.
For large uploads, raise client_max_body_size:
1 | |
File system layout on S3#
ScaiDrive stores:
chunks/{tenant_id}/{hash[0:2]}/{hash[2:4]}/{hash}— chunk blobsuploads/{session_id}/chunks/{index}— pending resumable-upload chunksavatars/{tenant_id}/{user_id}— user avatars
Content is never stored under user-identifying paths. A chunk used by a file owned by Alice is keyed only by hash — Alice's name does not appear in the path.
Lifecycle policies:
uploads/prefix — set an S3 lifecycle rule to delete objects after 48 hours. ScaiDrive's own expiry cleanup is belt-and-braces, not strictly needed if the bucket has a lifecycle.chunks/prefix — never expire; ScaiDrive GCs these when reference counts reach zero.
Scaling#
API layer#
Stateless — scale linearly. Rate-limit counters are in Redis, shared across replicas. Sticky sessions are not required.
Worker layer#
Workers consume ARQ queues from Redis. Scaling up workers drains queues faster. Separate queues for:
high— vectorization, DLP inlinemedium— connector synclow— quota recompute, retention sweeps
A small deployment can run one worker per queue; a larger deployment runs N workers with queue affinity.
Database#
The MariaDB schema is normalized with per-tenant composite indexes. For large tenants (>10M files), consider read-replicas; the ORM has a use_read_replica hint for list-heavy endpoints.
Storage#
S3-compatible stores scale independently. For on-prem, Garage (tiered, distributed) is the most commonly deployed choice alongside ScaiDrive.
Backups#
MariaDB: Standard mariabackup or managed-DB point-in-time recovery. ScaiDrive expects a consistent snapshot — a hot backup tool is required for zero downtime.
S3: Versioning on the bucket plus replication to a second region. Chunks are immutable once written; no special consideration needed.
Redis: Ephemeral. Don't bother backing up. On Redis loss, rate-limit counters reset and queue state is lost — workers retry jobs.