Architecture
ScaiMatrix is a ScaiGrid module — it runs in the same FastAPI process, behind the same auth, against the same MariaDB. The only external services it adds are a vector store (Weaviate) and a graph store (Neo4j), both optional in the sense that absence degrades gracefully rather than 5xx'ing.
Components#
There's no separate ScaiMatrix deployment. Routes mount under the module registry; the arq worker pool runs ingestion as background tasks.
Request flow: search#
- HTTP ->
POST /collections/{id}/search-> auth + module permission check. - Collection load ->
CollectionService.get_for_tenant(tenant scoped). - Access check ->
CollectionAccessService.require_access(user, collection, "read"). Fails closed. - Embed query ->
InferenceService.embedwith the collection'sembedding_model. The call is metered to the caller. - Vector store query -> Weaviate
near_vectoragainst the collection's class, scoped by tenant. - ACL chokepoint ->
filter_results_by_acl(session, user, candidates). Every result withdocument_idis evaluated byAclResolver.can(user, ref_for_document(doc), Permission.READ). Denied rows are dropped before serialization. - Response assembled with
success(...)— no chunk leaks into counts or metadata if its document was denied.
Request flow: ingestion#
- HTTP ->
POST /collections/{id}/documents(multipart). Tenant + collection write check. - Blob write -> file goes to S3 via the document-store client; a row is inserted into
mod_scaimatrix_documentswithstatus: pending. - Enqueue ->
ingest_documentjob pushed to arq. - Worker picks up the job:
processing— extract text per content type (PDF, DOCX, HTML, Markdown, plain text, source code).chunking— split per collection'schunking_strategy(fixed,paragraph,semantic,markdown,code) atchunk_sizewithchunk_overlap.embedding— call the embedding model in batches; write vectors to Weaviate.graph_extracting— ifgraph_enabled, promptgraph_extraction_modelto emit nodes + edges, dedupe against existing graph, write to Neo4j.indexedon success;failedwitherror_messageotherwise.
- Counters on the collection (
document_count,chunk_count,total_size_bytes,node_count,edge_count) are maintained as the worker progresses.
Request flow: crawl#
- Trigger — ad-hoc
POST /collections/{id}/crawl, manualPOST /crawls/{id}/run, webhookPOST /crawls/{id}/trigger(HMAC-verified), or scheduled by the worker. - Job row in
mod_scaimatrix_crawl_jobswithstatus: pending, limits (max_depth,max_pages,max_total_bytes,follow_external). - Worker fetches the seed, respects
robots.txt, walks links breadth-first within limits, and posts each fetched page back through the document ingestion path. - Live progress via
GET /collections/{id}/crawl/{job_id}/stream(SSE), driven by polling the job row every two seconds. - Terminal statuses are
completed,failed,cancelled. The job row stays around for history.
State#
- Collections, documents, ACLs, ACEs, crawl configs, crawl jobs, graph views — MariaDB.
- Chunks + embeddings — Weaviate, one class per collection-slug, tenant tag in every object.
- Graph nodes + edges — Neo4j, labelled with tenant and collection ids.
- Document blobs — S3 via the document-store client.
- Re-chunk / re-extract state — denormalised onto the collection row (
rechunk_status,graph_reextract_statusplus counters).
The ACL chokepoint#
The v2 correctness invariant is "search and retrieval never return data the calling user lacks READ on." That's enforced by a single function — filter_results_by_acl — that every search, list, and graph result passes through before serialization. Any new surface that returns documents or chunks must route results through that chokepoint or the property tests fail.
AclResolver.can(user, ref, Permission.X) is the underlying primitive. It walks: explicit deny on the resource -> explicit allow -> inherited deny -> inherited allow, with super-admin / tenant-admin / owner bypasses. Group expansion is transitive (mirrored from ScaiKey via the mod_scaimatrix_scaikey_nested_groups table + every-10-min reconcile cron).
Trust boundary#
The HTTP layer is the only boundary that matters. Inside the process:
- The vector store query is not ACL-aware — it returns whatever matches, and the chokepoint filters.
- The graph store query is the same — Neo4j returns whatever Cypher asks for, and
filter_graph_results_by_aclgates it. - Re-running the resolver in two places (the chokepoint + per-document fetch in
GET /documents/{id}) is intentional defense in depth; the cost is negligible against a hit-only set.
That layering exists because index-side filtering would push tenant + group + ACE state into Weaviate and Neo4j, which is operationally expensive and easy to skew. One chokepoint, exhaustively tested, beats two.
Tenant isolation#
Every ScaiMatrix row carries a tenant_id. Every Weaviate object is written with a tenant tag; every Cypher query against Neo4j filters by tenant_id in the MATCH clause. Cross-tenant reads are impossible at the storage layer — not just gated, structurally absent — because the queries that the route handlers issue never widen scope past the caller's tenant. Super-admin operations are the one exception, and they take an explicit tenant id parameter.
The same is true of the audit log: every ScaiMatrix-emitted entry tags tenant_id so a tenant admin querying /v1/audit/events?module=scaimatrix sees only their tenant's history.
Background workers#
Four arq jobs back the slow paths:
ingest_document— extract / chunk / embed / (optional) graph-extract for one document.crawl_website— crawl a seed URL under depth + page + byte budgets, posting each fetched page through the ingestion path.rechunk_collection— drop + recreate every document's chunks under the collection's current chunking parameters.reextract_collection_graph— wipe and re-run graph extraction over every indexed document.
Each job updates counters on the collection row as it progresses so dashboards stay live without polling the workers themselves. rechunk_status / graph_reextract_status go through idle -> queued -> running -> completed | failed, with total, processed, failed numbers maintained throughout.
Graceful degradation#
ScaiMatrix is designed to keep the rest of the API alive when an external dependency is down:
- Weaviate down — search endpoints return zero results and log a warning; ingestion stays queued until Weaviate is back.
- Neo4j down — graph endpoints return zero-shaped responses with
graph_available: falseinstead of 5xx; the rest of the module is unaffected. - Embedding model unavailable — ingestion documents stop progressing past
embeddingand surface the upstream error onerror_message; the route layer still serves reads.
Health is reflected at /health/detailed so operators can see which backends are degraded before users report symptoms.