Architecture

ScaiMatrix is a ScaiGrid module — it runs in the same FastAPI process, behind the same auth, against the same MariaDB. The only external services it adds are a vector store (Weaviate) and a graph store (Neo4j), both optional in the sense that absence degrades gracefully rather than 5xx'ing.

Components#

flowchart LR App["App Caller"] subgraph SG ["ScaiGrid"] Routes["/v1/modules/scaimatrix /collections/..."] Core["Routes Services AclResolver SearchService GraphService"] Maria["MariaDB collections, documents, ACLs"] Worker["arq worker pool ingest, crawl, reextract, rechunk"] end SD["ScaiDrive S3"] Wv["Weaviate vectors"] Neo["Neo4j optional"] Inf["Inference embedding model"] App -- "upload doc" --> Routes App -- search --> Routes Routes --> Core Core --> Maria Routes <-- blob --> SD Core <-- vectors --> Wv Core <-- graph --> Neo Worker <-- embed --> Inf Core -. "ACL-gated response" .-> App

There's no separate ScaiMatrix deployment. Routes mount under the module registry; the arq worker pool runs ingestion as background tasks.

Request flow: search#

HTTP -> POST /collections/{id}/search -> auth + module permission check.
Collection load -> CollectionService.get_for_tenant (tenant scoped).
Access check -> CollectionAccessService.require_access(user, collection, "read"). Fails closed.
Embed query -> InferenceService.embed with the collection's embedding_model. The call is metered to the caller.
Vector store query -> Weaviate near_vector against the collection's class, scoped by tenant.
ACL chokepoint -> filter_results_by_acl(session, user, candidates). Every result with document_id is evaluated by AclResolver.can(user, ref_for_document(doc), Permission.READ). Denied rows are dropped before serialization.
Response assembled with success(...) — no chunk leaks into counts or metadata if its document was denied.

Request flow: ingestion#

HTTP -> POST /collections/{id}/documents (multipart). Tenant + collection write check.
Blob write -> file goes to S3 via the document-store client; a row is inserted into mod_scaimatrix_documents with status: pending.
Enqueue -> ingest_document job pushed to arq.
Worker picks up the job:
- processing — extract text per content type (PDF, DOCX, HTML, Markdown, plain text, source code).
- chunking — split per collection's chunking_strategy (fixed, paragraph, semantic, markdown, code) at chunk_size with chunk_overlap.
- embedding — call the embedding model in batches; write vectors to Weaviate.
- graph_extracting — if graph_enabled, prompt graph_extraction_model to emit nodes + edges, dedupe against existing graph, write to Neo4j.
- indexed on success; failed with error_message otherwise.
Counters on the collection (document_count, chunk_count, total_size_bytes, node_count, edge_count) are maintained as the worker progresses.

Request flow: crawl#

Trigger — ad-hoc POST /collections/{id}/crawl, manual POST /crawls/{id}/run, webhook POST /crawls/{id}/trigger (HMAC-verified), or scheduled by the worker.
Job row in mod_scaimatrix_crawl_jobs with status: pending, limits (max_depth, max_pages, max_total_bytes, follow_external).
Worker fetches the seed, respects robots.txt, walks links breadth-first within limits, and posts each fetched page back through the document ingestion path.
Live progress via GET /collections/{id}/crawl/{job_id}/stream (SSE), driven by polling the job row every two seconds.
Terminal statuses are completed, failed, cancelled. The job row stays around for history.

State#

Collections, documents, ACLs, ACEs, crawl configs, crawl jobs, graph views — MariaDB.
Chunks + embeddings — Weaviate, one class per collection-slug, tenant tag in every object.
Graph nodes + edges — Neo4j, labelled with tenant and collection ids.
Document blobs — S3 via the document-store client.
Re-chunk / re-extract state — denormalised onto the collection row (rechunk_status, graph_reextract_status plus counters).

The ACL chokepoint#

The v2 correctness invariant is "search and retrieval never return data the calling user lacks READ on." That's enforced by a single function — filter_results_by_acl — that every search, list, and graph result passes through before serialization. Any new surface that returns documents or chunks must route results through that chokepoint or the property tests fail.

AclResolver.can(user, ref, Permission.X) is the underlying primitive. It walks: explicit deny on the resource -> explicit allow -> inherited deny -> inherited allow, with super-admin / tenant-admin / owner bypasses. Group expansion is transitive (mirrored from ScaiKey via the mod_scaimatrix_scaikey_nested_groups table + every-10-min reconcile cron).

Trust boundary#

The HTTP layer is the only boundary that matters. Inside the process:

The vector store query is not ACL-aware — it returns whatever matches, and the chokepoint filters.
The graph store query is the same — Neo4j returns whatever Cypher asks for, and filter_graph_results_by_acl gates it.
Re-running the resolver in two places (the chokepoint + per-document fetch in GET /documents/{id}) is intentional defense in depth; the cost is negligible against a hit-only set.

That layering exists because index-side filtering would push tenant + group + ACE state into Weaviate and Neo4j, which is operationally expensive and easy to skew. One chokepoint, exhaustively tested, beats two.

Tenant isolation#

Every ScaiMatrix row carries a tenant_id. Every Weaviate object is written with a tenant tag; every Cypher query against Neo4j filters by tenant_id in the MATCH clause. Cross-tenant reads are impossible at the storage layer — not just gated, structurally absent — because the queries that the route handlers issue never widen scope past the caller's tenant. Super-admin operations are the one exception, and they take an explicit tenant id parameter.

The same is true of the audit log: every ScaiMatrix-emitted entry tags tenant_id so a tenant admin querying /v1/audit/events?module=scaimatrix sees only their tenant's history.

Background workers#

Four arq jobs back the slow paths:

ingest_document — extract / chunk / embed / (optional) graph-extract for one document.
crawl_website — crawl a seed URL under depth + page + byte budgets, posting each fetched page through the ingestion path.
rechunk_collection — drop + recreate every document's chunks under the collection's current chunking parameters.
reextract_collection_graph — wipe and re-run graph extraction over every indexed document.

Each job updates counters on the collection row as it progresses so dashboards stay live without polling the workers themselves. rechunk_status / graph_reextract_status go through idle -> queued -> running -> completed | failed, with total, processed, failed numbers maintained throughout.

Graceful degradation#

ScaiMatrix is designed to keep the rest of the API alive when an external dependency is down:

Weaviate down — search endpoints return zero results and log a warning; ingestion stays queued until Weaviate is back.
Neo4j down — graph endpoints return zero-shaped responses with graph_available: false instead of 5xx; the rest of the module is unaffected.
Embedding model unavailable — ingestion documents stop progressing past embedding and surface the upstream error on error_message; the route layer still serves reads.

Health is reflected at /health/detailed so operators can see which backends are degraded before users report symptoms.