API reference
All endpoints are mounted at /v1/modules/scaimatrix/ and authenticate with the standard ScaiGrid bearer token (Authorization: Bearer sgk_... or a JWT issued by ScaiKey). Responses use ScaiGrid's standard envelope ({ "data": ... } for success, { "error": ... } for failures, { "data": [...], "meta": { "next_cursor": ..., "has_more": ... } } for paginated lists).
All endpoints below require the appropriate module permission (see Permissions) and the appropriate per-resource ACE (see ACLs).
Collections#
POST /collections#
Create a collection.
| Field | Required | Notes |
|---|---|---|
name |
yes | Human-readable name. |
description |
no | Free text. |
embedding_model |
yes | Frontend model id available to your tenant. |
chunking_strategy |
no | fixed, paragraph (default), semantic, markdown, code. |
chunk_size |
no | Default 512. |
chunk_overlap |
no | Default 50. |
graph_enabled |
no | Default false. |
graph_extraction_model |
no | Required if graph_enabled. A chat model. |
graph_max_nodes |
no | Default 250,000. NULL = unlimited. |
graph_max_edges |
no | Default 500,000. NULL = unlimited. |
storage_quota_bytes |
no | NULL = unlimited. |
default_access |
no | tenant (default) or restricted. |
metadata |
no | Arbitrary JSON. |
Slug is auto-generated from the name and must be unique within the tenant. Returns 201 with the created collection.
GET /collections#
List collections accessible to the caller. Cursor-paginated (limit, cursor query params). Returns each collection with a user_permission field (read, write, manage, or null).
GET /collections/{collection_id}#
Fetch one collection's full config and counters.
PUT /collections/{collection_id}#
Update mutable fields. Setting graph_enabled false -> true queues a re-extract over existing documents (signalled by follow_ups.graph_reextract: "queued" in the response). Changing chunking parameters sets follow_ups.rechunk_recommended: true without auto-queuing — call /re-chunk when you're ready.
DELETE /collections/{collection_id}#
Hard-delete. Cascades to documents, chunks (in Weaviate), graph contents (in Neo4j), ACLs, and crawl configs.
POST /collections/{collection_id}/reindex#
Re-queue ingestion for every document in the collection. Useful after a backend storage migration.
POST /collections/{collection_id}/fork#
Clone the collection's metadata + ACLs + config into a new collection. Documents are not copied. Use this when the embedding model needs to change.
1 2 3 4 5 6 | |
POST /collections/{collection_id}/re-chunk#
Queue an in-place re-chunk over every indexed document using the collection's current chunking parameters and embedding model. Idempotent: a second call while one is running returns 409 with the existing job's status.
GET /collections/{collection_id}/re-chunk#
Current / last re-chunk job status (counters: total, processed, failed, started_at, finished_at).
Documents#
POST /collections/{collection_id}/documents#
Upload a document. multipart/form-data with file (binary) and optional metadata (JSON string). Returns the new document with status: pending; a background job indexes it.
POST /collections/{collection_id}/documents/bulk#
Upload many documents in one request — files field repeated, each entry one file. Each document gets its own ingestion job. Use this for warming a collection from a directory.
POST /collections/{collection_id}/documents/from-url#
Ingest from an external URL. Body: { "url": "...", "name": "...", "metadata": {...} }.
POST /collections/{collection_id}/documents/from-scaidrive#
Ingest from a ScaiDrive file id. Body: { "file_id": "...", "name": "...", "metadata": {...} }.
GET /collections/{collection_id}/documents#
Cursor-paginated list. Each row passes through the ACL chokepoint — documents the caller can't read are dropped silently.
GET /collections/{collection_id}/documents/{document_id}#
Fetch one document's metadata and status. ACL-gated; if the user can't read it, the response is the standard 403 COLLECTION_ACCESS_DENIED rather than a 404 leaking existence (note: equivalent in effect — listing won't surface it either).
DELETE /collections/{collection_id}/documents/{document_id}#
Remove a document. Drops chunks from the vector store and contributions from the graph store. Blob in S3 is cleaned up asynchronously.
Search#
POST /search#
Global search across every collection the caller can read.
| Field | Notes |
|---|---|
query |
The query text. |
collections |
Optional list of collection ids or slugs. Omit to span everything. |
top_k |
Default 10. |
min_score |
Default 0.0. |
search_type |
vector (default), hybrid, or keyword. |
filters |
Optional metadata filter object. |
include_content |
Default true. |
include_metadata |
Default true. |
Returns { "results": [...], "total": N }. Each result has chunk_id, document_id, document_name, collection_id, content, score, metadata.
POST /collections/{collection_id}/search#
Same body as global search, scoped to one collection. The collection's embedding_model is used regardless of any model passed.
POST /collections/{collection_id}/search/combined#
Vector hits + graph traversal seeded from those hits.
| Field | Notes |
|---|---|
query |
The query text. |
vector_top_k |
Default 5. |
graph_depth |
Default 2. |
graph_expand_labels |
Optional list of node labels to expand. |
include_content |
Default true. |
min_score |
Default 0.0. |
Returns both the vector hits and the connected sub-graph, ACL-gated end-to-end.
Graph#
All graph endpoints are scoped to a collection that has graph_enabled: true. Calls against graph-disabled collections return empty results (no 422 — quieter UX).
GET /collections/{collection_id}/graph/nodes#
List nodes. Query params: label, skip, limit (max 100). Returns items, total, and visible_count (post-ACL).
GET /collections/{collection_id}/graph/nodes/{node_id}#
One node. ACL-gated through filter_graph_results_by_acl.
POST /collections/{collection_id}/graph/nodes#
Create a node manually. Body: { "label": "Product", "name": "WidgetPro", "properties": {...} }. Subject to graph_max_nodes quota.
PUT /collections/{collection_id}/graph/nodes/{node_id}#
Update name and/or properties.
DELETE /collections/{collection_id}/graph/nodes/{node_id}#
Delete a node. Connected edges are removed.
GET /collections/{collection_id}/graph/edges#
List edges. Query params: skip, limit.
POST /collections/{collection_id}/graph/edges#
Create an edge. Body: { "source_node_id": "...", "target_node_id": "...", "relationship_type": "compatible_with", "properties": {...} }. Subject to graph_max_edges quota.
DELETE /collections/{collection_id}/graph/edges/{edge_id}#
Remove an edge.
POST /collections/{collection_id}/graph/query#
Run a parameterised read-only Cypher-ish query against the collection's subgraph.
1 | |
Returns nodes, edges, query_time_ms.
POST /collections/{collection_id}/graph/traverse#
BFS from a node. Body: { "start_node_id": "...", "depth": 2, "labels": ["Product", "Feature"] }.
POST /collections/{collection_id}/graph/path#
Shortest path between two nodes. Body: { "source_node_id": "...", "target_node_id": "...", "max_depth": 6 }. Returns the path nodes / edges, path_length, and found: bool. If any intermediate node is ACL-denied, the entire path is dropped (no partial reveals).
POST /collections/{collection_id}/graph/ask#
Natural-language graph question. The configured chat model generates Cypher; mutation keywords are rejected; the read-only Cypher runs through the standard chokepoint.
1 | |
Returns the generated cypher (for transparency), nodes, edges, query_time_ms, model_used.
GET /collections/{collection_id}/graph/search#
Substring search over node names. Query params: q, limit.
GET /collections/{collection_id}/graph/context#
Curated subgraph formatted as Markdown for LLM prompts. Query params: label, max_nodes, max_chars (default 8000). Returns nodes, edges, formatted_text, truncated.
GET /collections/{collection_id}/graph/stats#
Aggregate counts: nodes, edges, label distribution, relationship-type distribution, isolated nodes, per-document extraction status (total / extracted / failed / pending).
GET /collections/{collection_id}/graph/clusters#
One cluster per node label with counts and sample most-connected node ids. Designed for the virtualised large-graph admin view. Query param: sample_per_cluster (default 5).
GET /collections/{collection_id}/graph/changes#
Adds-only diff since a cutoff timestamp. Query params: since (ISO-8601, required), limit (default 500). Returns nodes, edges, truncated.
GET /collections/{collection_id}/graph/events#
Server-Sent Events stream of live graph mutations for the collection. Events: scaimatrix.graph.node_created, node_updated, node_deleted, edge_created, edge_deleted. Best-effort delivery; clients reload on connect.
GET /collections/{collection_id}/graph/export#
NDJSON dump of every visible node and edge for backup or migration. One JSON object per line: {"type":"node","data":{...}}, {"type":"edge","data":{...}}, terminating {"type":"summary","data":{"nodes":N,"edges":M}}. ACL-gated end-to-end. Streams so large graphs don't materialise in memory.
POST /collections/{collection_id}/graph/import#
Bulk-import a JSON dump.
1 | |
Capped at 5,000 nodes + 5,000 edges per call. Duplicates by id are skipped (idempotent). Returns nodes_created, nodes_skipped, edges_created, edges_skipped.
POST /collections/{collection_id}/graph/re-extract#
Queue a graph re-extract over the collection's indexed documents. Idempotent: a second call while running returns 409 with status.
GET /collections/{collection_id}/graph/re-extract#
Current / last re-extract job status.
Crawl#
POST /collections/{collection_id}/crawl#
Start an ad-hoc crawl. Body: { "url": "...", "max_depth": 3, "max_pages": 100, "max_total_bytes": 52428800, "follow_external": false }. Returns a CrawlJobRead.
GET /collections/{collection_id}/crawl/{crawl_id}#
Fetch one crawl job.
DELETE /collections/{collection_id}/crawl/{crawl_id}#
Cancel a running crawl (sets status: cancelled).
GET /collections/{collection_id}/crawl/{crawl_id}/stream#
SSE stream of progress events. progress every two seconds, terminal done when the job ends.
Crawl configs#
POST /collections/{collection_id}/crawls#
Create a recurring crawl config. Body fields: name, seed_url, max_depth, max_pages, max_total_bytes, follow_external, optional schedule ({type: "daily", time: "03:00"}), optional webhook ({enabled: true}). If webhook.enabled is set, the response includes a one-time webhook_secret.
GET /collections/{collection_id}/crawls#
List crawl configs for a collection (paginated).
GET /collections/{collection_id}/crawls/{config_id}#
Fetch one config.
PUT /collections/{collection_id}/crawls/{config_id}#
Update fields. Pass clear_schedule: true or clear_webhook: true to drop them entirely.
DELETE /collections/{collection_id}/crawls/{config_id}#
Remove a config. Run history is retained.
POST /collections/{collection_id}/crawls/{config_id}/run#
Manual one-off run. Returns the new CrawlJobRead.
GET /collections/{collection_id}/crawls/{config_id}/jobs#
Paginated history of runs for one config.
POST /collections/{collection_id}/crawls/{config_id}/trigger#
Inbound HMAC-verified webhook. Headers: X-Crawl-Signature (hex HMAC-SHA256 of timestamp + "." + body), X-Crawl-Timestamp (unix seconds). No JWT — the secret is the auth. 401 on bad / missing signature; 404 if webhook isn't enabled.
GET /crawl-jobs/{job_id}#
Fetch a crawl job without knowing the collection (useful when wiring up status views).
GET /crawl-jobs/{job_id}/documents#
Paginated list of documents produced by a specific crawl job.
Per-resource ACLs (v2)#
Resource type is collection or document. All ACL endpoints require CHANGE_PERMISSIONS on the target unless otherwise noted.
GET /permissions/{resource_type}/{resource_id}/acl#
Return the ACL, including all ACEs and the resource's owner_user_id. Caller needs READ_PERMISSIONS.
PATCH /permissions/{resource_type}/{resource_id}/acl#
Toggle inherit_from_parent. Body: { "inherit_from_parent": false }.
POST /permissions/{resource_type}/{resource_id}/acl/entries#
Add an ACE. Body:
| Field | Notes |
|---|---|
principal_type |
user or group. |
principal_id |
ScaiKey id. |
ace_type |
allow (default) or deny. |
permissions |
Integer bitmask (see ACLs). |
inherit_to_children |
Default true. |
Returns the created ACE. INGEST on a document resource is rejected (INVALID_ACE).
DELETE /permissions/{resource_type}/{resource_id}/acl/entries/{ace_id}#
Remove an ACE. Returns 204.
GET /permissions/{resource_type}/{resource_id}/effective#
Compute the calling user's effective permission bitmask. Returns permissions, permission_names, and a can object with named booleans for each verb.
POST /permissions/{resource_type}/{resource_id}/take-ownership#
Reassign the owner. Body: { "new_owner_user_id": "usr_..." }. Caller needs TAKE_OWNERSHIP. Audit-logged.
Legacy collection access (v1)#
These pre-date v2 ACLs. New integrations should use the per-resource endpoints above.
GET /collections/{collection_id}/accessPOST /collections/{collection_id}/access—{ "grantee_type": "user|group", "grantee_id": "...", "permission": "read|write|manage" }PUT /collections/{collection_id}/access/{access_id}—{ "permission": "..." }DELETE /collections/{collection_id}/access/{access_id}
Saved graph views#
Each view is a (filter + tool + tint) snapshot the viewer can round-trip. private views are visible only to the owner; tenant views are visible to everyone with read on the collection.
GET /collections/{collection_id}/graph/views#
List views the caller can see (their own private + every tenant-scoped one).
POST /collections/{collection_id}/graph/views#
Create. Body: { "name": "...", "description": "...", "scope": "private|tenant", "config": {...} }. tenant scope requires collection write.
GET /collections/{collection_id}/graph/views/{view_id}#
Fetch one view.
PATCH /collections/{collection_id}/graph/views/{view_id}#
Update. Promoting private -> tenant requires collection write.
DELETE /collections/{collection_id}/graph/views/{view_id}#
Remove. Owner or collection-write.
Errors#
All endpoints return ScaiGrid's standard error envelope:
1 2 3 4 5 6 7 | |
ScaiMatrix-specific codes:
| Code | Meaning |
|---|---|
COLLECTION_ACCESS_DENIED |
The user can't access this collection at the required permission level. |
COLLECTION_ACCESS_NOT_FOUND |
Legacy access grant id not found. |
INVALID_ACCESS_GRANT |
Legacy grant body was malformed. |
STORAGE_QUOTA_EXCEEDED |
Adding this document would push the collection past its storage_quota_bytes. |
GRAPH_QUOTA_EXCEEDED |
Adding these nodes / edges would breach graph_max_nodes / graph_max_edges. |
GRAPH_NOT_ENABLED |
The collection has graph_enabled: false. |
GRAPH_REEXTRACT_ALREADY_RUNNING |
A re-extract is already in flight; check the status endpoint. |
GRAPH_IMPORT_TOO_LARGE |
Payload exceeded the 5,000 + 5,000 cap; chunk client-side. |
GRAPH_IMPORT_INVALID_SHAPE |
Body wasn't { "nodes": [...], "edges": [...] }. |
GRAPH_VIEW_INVALID_SCOPE |
Scope wasn't private or tenant. |
GRAPH_VIEW_NOT_FOUND |
View id missing or not visible to the caller. |
RECHUNK_ALREADY_RUNNING |
A re-chunk is already in flight. |
CRAWL_JOB_NOT_FOUND |
Crawl job id missing or wrong tenant. |
CRAWL_CONFIG_NOT_FOUND |
Crawl config missing, disabled, or webhook misconfigured. |
CRAWL_ALREADY_RUNNING |
A crawl is already in flight for this collection. |
INVALID_SCHEDULE_CONFIG |
Schedule fields didn't validate. |
INVALID_CRON_EXPRESSION |
Cron string didn't parse. |
ACL_NOT_FOUND |
Target resource or ACL not visible to caller. |
ACE_NOT_FOUND |
ACE id missing or doesn't belong to this resource. |
INVALID_ACE |
Bad principal_type / ace_type / permissions combination. |