API reference

All endpoints are mounted at /v1/modules/scaimatrix/ and authenticate with the standard ScaiGrid bearer token (Authorization: Bearer sgk_... or a JWT issued by ScaiKey). Responses use ScaiGrid's standard envelope ({ "data": ... } for success, { "error": ... } for failures, { "data": [...], "meta": { "next_cursor": ..., "has_more": ... } } for paginated lists).

All endpoints below require the appropriate module permission (see Permissions) and the appropriate per-resource ACE (see ACLs).

Collections#

`POST /collections`#

Create a collection.

Field	Required	Notes
`name`	yes	Human-readable name.
`description`	no	Free text.
`embedding_model`	yes	Frontend model id available to your tenant.
`chunking_strategy`	no	`fixed`, `paragraph` (default), `semantic`, `markdown`, `code`.
`chunk_size`	no	Default 512.
`chunk_overlap`	no	Default 50.
`graph_enabled`	no	Default false.
`graph_extraction_model`	no	Required if `graph_enabled`. A chat model.
`graph_max_nodes`	no	Default 250,000. NULL = unlimited.
`graph_max_edges`	no	Default 500,000. NULL = unlimited.
`storage_quota_bytes`	no	NULL = unlimited.
`default_access`	no	`tenant` (default) or `restricted`.
`metadata`	no	Arbitrary JSON.

Slug is auto-generated from the name and must be unique within the tenant. Returns 201 with the created collection.

`GET /collections`#

List collections accessible to the caller. Cursor-paginated (limit, cursor query params). Returns each collection with a user_permission field (read, write, manage, or null).

`GET /collections/{collection_id}`#

Fetch one collection's full config and counters.

`PUT /collections/{collection_id}`#

Update mutable fields. Setting graph_enabled false -> true queues a re-extract over existing documents (signalled by follow_ups.graph_reextract: "queued" in the response). Changing chunking parameters sets follow_ups.rechunk_recommended: true without auto-queuing — call /re-chunk when you're ready.

`DELETE /collections/{collection_id}`#

Hard-delete. Cascades to documents, chunks (in Weaviate), graph contents (in Neo4j), ACLs, and crawl configs.

`POST /collections/{collection_id}/reindex`#

Re-queue ingestion for every document in the collection. Useful after a backend storage migration.

`POST /collections/{collection_id}/fork`#

Clone the collection's metadata + ACLs + config into a new collection. Documents are not copied. Use this when the embedding model needs to change.

json
{
  "name": "...",
  "embedding_model": "openai/text-embedding-3-large",
  "copy_acls": true,
  "copy_metadata": true
}

`POST /collections/{collection_id}/re-chunk`#

Queue an in-place re-chunk over every indexed document using the collection's current chunking parameters and embedding model. Idempotent: a second call while one is running returns 409 with the existing job's status.

`GET /collections/{collection_id}/re-chunk`#

Current / last re-chunk job status (counters: total, processed, failed, started_at, finished_at).

Documents#

`POST /collections/{collection_id}/documents`#

Upload a document. multipart/form-data with file (binary) and optional metadata (JSON string). Returns the new document with status: pending; a background job indexes it.

`POST /collections/{collection_id}/documents/bulk`#

Upload many documents in one request — files field repeated, each entry one file. Each document gets its own ingestion job. Use this for warming a collection from a directory.

`POST /collections/{collection_id}/documents/from-url`#

Ingest from an external URL. Body: { "url": "...", "name": "...", "metadata": {...} }.

`POST /collections/{collection_id}/documents/from-scaidrive`#

Ingest from a ScaiDrive file id. Body: { "file_id": "...", "name": "...", "metadata": {...} }.

`GET /collections/{collection_id}/documents`#

Cursor-paginated list. Each row passes through the ACL chokepoint — documents the caller can't read are dropped silently.

`GET /collections/{collection_id}/documents/{document_id}`#

Fetch one document's metadata and status. ACL-gated; if the user can't read it, the response is the standard 403 COLLECTION_ACCESS_DENIED rather than a 404 leaking existence (note: equivalent in effect — listing won't surface it either).

`DELETE /collections/{collection_id}/documents/{document_id}`#

Remove a document. Drops chunks from the vector store and contributions from the graph store. Blob in S3 is cleaned up asynchronously.

Search#

`POST /search`#

Global search across every collection the caller can read.

Field	Notes
`query`	The query text.
`collections`	Optional list of collection ids or slugs. Omit to span everything.
`top_k`	Default 10.
`min_score`	Default 0.0.
`search_type`	`vector` (default), `hybrid`, or `keyword`.
`filters`	Optional metadata filter object.
`include_content`	Default true.
`include_metadata`	Default true.

Returns { "results": [...], "total": N }. Each result has chunk_id, document_id, document_name, collection_id, content, score, metadata.

`POST /collections/{collection_id}/search`#

Same body as global search, scoped to one collection. The collection's embedding_model is used regardless of any model passed.

`POST /collections/{collection_id}/search/combined`#

Vector hits + graph traversal seeded from those hits.

Field	Notes
`query`	The query text.
`vector_top_k`	Default 5.
`graph_depth`	Default 2.
`graph_expand_labels`	Optional list of node labels to expand.
`include_content`	Default true.
`min_score`	Default 0.0.

Returns both the vector hits and the connected sub-graph, ACL-gated end-to-end.

Graph#

All graph endpoints are scoped to a collection that has graph_enabled: true. Calls against graph-disabled collections return empty results (no 422 — quieter UX).

`GET /collections/{collection_id}/graph/nodes`#

List nodes. Query params: label, skip, limit (max 100). Returns items, total, and visible_count (post-ACL).

`GET /collections/{collection_id}/graph/nodes/{node_id}`#

One node. ACL-gated through filter_graph_results_by_acl.

`POST /collections/{collection_id}/graph/nodes`#

Create a node manually. Body: { "label": "Product", "name": "WidgetPro", "properties": {...} }. Subject to graph_max_nodes quota.

`PUT /collections/{collection_id}/graph/nodes/{node_id}`#

Update name and/or properties.

`DELETE /collections/{collection_id}/graph/nodes/{node_id}`#

Delete a node. Connected edges are removed.

`GET /collections/{collection_id}/graph/edges`#

List edges. Query params: skip, limit.

`POST /collections/{collection_id}/graph/edges`#

Create an edge. Body: { "source_node_id": "...", "target_node_id": "...", "relationship_type": "compatible_with", "properties": {...} }. Subject to graph_max_edges quota.

`DELETE /collections/{collection_id}/graph/edges/{edge_id}`#

Remove an edge.

`POST /collections/{collection_id}/graph/query`#

Run a parameterised read-only Cypher-ish query against the collection's subgraph.

json
{ "query": "MATCH (p:Product)-[:USES]->(t) WHERE t.name = $tech RETURN p", "parameters": {"tech": "Postgres"} }

Returns nodes, edges, query_time_ms.

`POST /collections/{collection_id}/graph/traverse`#

BFS from a node. Body: { "start_node_id": "...", "depth": 2, "labels": ["Product", "Feature"] }.

`POST /collections/{collection_id}/graph/path`#

Shortest path between two nodes. Body: { "source_node_id": "...", "target_node_id": "...", "max_depth": 6 }. Returns the path nodes / edges, path_length, and found: bool. If any intermediate node is ACL-denied, the entire path is dropped (no partial reveals).

`POST /collections/{collection_id}/graph/ask`#

Natural-language graph question. The configured chat model generates Cypher; mutation keywords are rejected; the read-only Cypher runs through the standard chokepoint.

json
{ "question": "Which products are compatible with WidgetPro?", "model": "scailabs/poolnoodle-omni" }

Returns the generated cypher (for transparency), nodes, edges, query_time_ms, model_used.

`GET /collections/{collection_id}/graph/search`#

Substring search over node names. Query params: q, limit.

`GET /collections/{collection_id}/graph/context`#

Curated subgraph formatted as Markdown for LLM prompts. Query params: label, max_nodes, max_chars (default 8000). Returns nodes, edges, formatted_text, truncated.

`GET /collections/{collection_id}/graph/stats`#

Aggregate counts: nodes, edges, label distribution, relationship-type distribution, isolated nodes, per-document extraction status (total / extracted / failed / pending).

`GET /collections/{collection_id}/graph/clusters`#

One cluster per node label with counts and sample most-connected node ids. Designed for the virtualised large-graph admin view. Query param: sample_per_cluster (default 5).

`GET /collections/{collection_id}/graph/changes`#

Adds-only diff since a cutoff timestamp. Query params: since (ISO-8601, required), limit (default 500). Returns nodes, edges, truncated.

`GET /collections/{collection_id}/graph/events`#

Server-Sent Events stream of live graph mutations for the collection. Events: scaimatrix.graph.node_created, node_updated, node_deleted, edge_created, edge_deleted. Best-effort delivery; clients reload on connect.

`GET /collections/{collection_id}/graph/export`#

NDJSON dump of every visible node and edge for backup or migration. One JSON object per line: {"type":"node","data":{...}}, {"type":"edge","data":{...}}, terminating {"type":"summary","data":{"nodes":N,"edges":M}}. ACL-gated end-to-end. Streams so large graphs don't materialise in memory.

`POST /collections/{collection_id}/graph/import`#

Bulk-import a JSON dump.

json
{ "nodes": [...], "edges": [...] }

Capped at 5,000 nodes + 5,000 edges per call. Duplicates by id are skipped (idempotent). Returns nodes_created, nodes_skipped, edges_created, edges_skipped.

`POST /collections/{collection_id}/graph/re-extract`#

Queue a graph re-extract over the collection's indexed documents. Idempotent: a second call while running returns 409 with status.

`GET /collections/{collection_id}/graph/re-extract`#

Current / last re-extract job status.

Crawl#

`POST /collections/{collection_id}/crawl`#

Start an ad-hoc crawl. Body: { "url": "...", "max_depth": 3, "max_pages": 100, "max_total_bytes": 52428800, "follow_external": false }. Returns a CrawlJobRead.

`GET /collections/{collection_id}/crawl/{crawl_id}`#

Fetch one crawl job.

`DELETE /collections/{collection_id}/crawl/{crawl_id}`#

Cancel a running crawl (sets status: cancelled).

`GET /collections/{collection_id}/crawl/{crawl_id}/stream`#

SSE stream of progress events. progress every two seconds, terminal done when the job ends.

Crawl configs#

`POST /collections/{collection_id}/crawls`#

Create a recurring crawl config. Body fields: name, seed_url, max_depth, max_pages, max_total_bytes, follow_external, optional schedule ({type: "daily", time: "03:00"}), optional webhook ({enabled: true}). If webhook.enabled is set, the response includes a one-time webhook_secret.

`GET /collections/{collection_id}/crawls`#

List crawl configs for a collection (paginated).

`GET /collections/{collection_id}/crawls/{config_id}`#

Fetch one config.

`PUT /collections/{collection_id}/crawls/{config_id}`#

Update fields. Pass clear_schedule: true or clear_webhook: true to drop them entirely.

`DELETE /collections/{collection_id}/crawls/{config_id}`#

Remove a config. Run history is retained.

`POST /collections/{collection_id}/crawls/{config_id}/run`#

Manual one-off run. Returns the new CrawlJobRead.

`GET /collections/{collection_id}/crawls/{config_id}/jobs`#

Paginated history of runs for one config.

`POST /collections/{collection_id}/crawls/{config_id}/trigger`#

Inbound HMAC-verified webhook. Headers: X-Crawl-Signature (hex HMAC-SHA256 of timestamp + "." + body), X-Crawl-Timestamp (unix seconds). No JWT — the secret is the auth. 401 on bad / missing signature; 404 if webhook isn't enabled.

`GET /crawl-jobs/{job_id}`#

Fetch a crawl job without knowing the collection (useful when wiring up status views).

`GET /crawl-jobs/{job_id}/documents`#

Paginated list of documents produced by a specific crawl job.

Per-resource ACLs (v2)#

Resource type is collection or document. All ACL endpoints require CHANGE_PERMISSIONS on the target unless otherwise noted.

`GET /permissions/{resource_type}/{resource_id}/acl`#

Return the ACL, including all ACEs and the resource's owner_user_id. Caller needs READ_PERMISSIONS.

`PATCH /permissions/{resource_type}/{resource_id}/acl`#

Toggle inherit_from_parent. Body: { "inherit_from_parent": false }.

`POST /permissions/{resource_type}/{resource_id}/acl/entries`#

Add an ACE. Body:

Field	Notes
`principal_type`	`user` or `group`.
`principal_id`	ScaiKey id.
`ace_type`	`allow` (default) or `deny`.
`permissions`	Integer bitmask (see ACLs).
`inherit_to_children`	Default true.

Returns the created ACE. INGEST on a document resource is rejected (INVALID_ACE).

`DELETE /permissions/{resource_type}/{resource_id}/acl/entries/{ace_id}`#

Remove an ACE. Returns 204.

`GET /permissions/{resource_type}/{resource_id}/effective`#

Compute the calling user's effective permission bitmask. Returns permissions, permission_names, and a can object with named booleans for each verb.

`POST /permissions/{resource_type}/{resource_id}/take-ownership`#

Reassign the owner. Body: { "new_owner_user_id": "usr_..." }. Caller needs TAKE_OWNERSHIP. Audit-logged.

Legacy collection access (v1)#

These pre-date v2 ACLs. New integrations should use the per-resource endpoints above.

GET /collections/{collection_id}/access
POST /collections/{collection_id}/access — { "grantee_type": "user|group", "grantee_id": "...", "permission": "read|write|manage" }
PUT /collections/{collection_id}/access/{access_id} — { "permission": "..." }
DELETE /collections/{collection_id}/access/{access_id}

Saved graph views#

Each view is a (filter + tool + tint) snapshot the viewer can round-trip. private views are visible only to the owner; tenant views are visible to everyone with read on the collection.

`GET /collections/{collection_id}/graph/views`#

List views the caller can see (their own private + every tenant-scoped one).

`POST /collections/{collection_id}/graph/views`#

Create. Body: { "name": "...", "description": "...", "scope": "private|tenant", "config": {...} }. tenant scope requires collection write.

`GET /collections/{collection_id}/graph/views/{view_id}`#

Fetch one view.

`PATCH /collections/{collection_id}/graph/views/{view_id}`#

Update. Promoting private -> tenant requires collection write.

`DELETE /collections/{collection_id}/graph/views/{view_id}`#

Remove. Owner or collection-write.

Errors#

All endpoints return ScaiGrid's standard error envelope:

json
{
  "error": {
    "code": "COLLECTION_ACCESS_DENIED",
    "message": "You do not have access to this collection"
  },
  "meta": { "request_id": "req_..." }
}

ScaiMatrix-specific codes:

Code	Meaning
`COLLECTION_ACCESS_DENIED`	The user can't access this collection at the required permission level.
`COLLECTION_ACCESS_NOT_FOUND`	Legacy access grant id not found.
`INVALID_ACCESS_GRANT`	Legacy grant body was malformed.
`STORAGE_QUOTA_EXCEEDED`	Adding this document would push the collection past its `storage_quota_bytes`.
`GRAPH_QUOTA_EXCEEDED`	Adding these nodes / edges would breach `graph_max_nodes` / `graph_max_edges`.
`GRAPH_NOT_ENABLED`	The collection has `graph_enabled: false`.
`GRAPH_REEXTRACT_ALREADY_RUNNING`	A re-extract is already in flight; check the status endpoint.
`GRAPH_IMPORT_TOO_LARGE`	Payload exceeded the 5,000 + 5,000 cap; chunk client-side.
`GRAPH_IMPORT_INVALID_SHAPE`	Body wasn't `{ "nodes": [...], "edges": [...] }`.
`GRAPH_VIEW_INVALID_SCOPE`	Scope wasn't `private` or `tenant`.
`GRAPH_VIEW_NOT_FOUND`	View id missing or not visible to the caller.
`RECHUNK_ALREADY_RUNNING`	A re-chunk is already in flight.
`CRAWL_JOB_NOT_FOUND`	Crawl job id missing or wrong tenant.
`CRAWL_CONFIG_NOT_FOUND`	Crawl config missing, disabled, or webhook misconfigured.
`CRAWL_ALREADY_RUNNING`	A crawl is already in flight for this collection.
`INVALID_SCHEDULE_CONFIG`	Schedule fields didn't validate.
`INVALID_CRON_EXPRESSION`	Cron string didn't parse.
`ACL_NOT_FOUND`	Target resource or ACL not visible to caller.
`ACE_NOT_FOUND`	ACE id missing or doesn't belong to this resource.
`INVALID_ACE`	Bad principal_type / ace_type / permissions combination.

API reference

Collections#

POST /collections#

GET /collections#

GET /collections/{collection_id}#

PUT /collections/{collection_id}#

DELETE /collections/{collection_id}#

POST /collections/{collection_id}/reindex#

POST /collections/{collection_id}/fork#

POST /collections/{collection_id}/re-chunk#

GET /collections/{collection_id}/re-chunk#

Documents#

POST /collections/{collection_id}/documents#

POST /collections/{collection_id}/documents/bulk#

POST /collections/{collection_id}/documents/from-url#

POST /collections/{collection_id}/documents/from-scaidrive#

GET /collections/{collection_id}/documents#

GET /collections/{collection_id}/documents/{document_id}#

DELETE /collections/{collection_id}/documents/{document_id}#

Search#

POST /search#

POST /collections/{collection_id}/search#

POST /collections/{collection_id}/search/combined#

Graph#

GET /collections/{collection_id}/graph/nodes#

GET /collections/{collection_id}/graph/nodes/{node_id}#

POST /collections/{collection_id}/graph/nodes#

PUT /collections/{collection_id}/graph/nodes/{node_id}#

DELETE /collections/{collection_id}/graph/nodes/{node_id}#

GET /collections/{collection_id}/graph/edges#

POST /collections/{collection_id}/graph/edges#

DELETE /collections/{collection_id}/graph/edges/{edge_id}#

POST /collections/{collection_id}/graph/query#

POST /collections/{collection_id}/graph/traverse#

POST /collections/{collection_id}/graph/path#

POST /collections/{collection_id}/graph/ask#

GET /collections/{collection_id}/graph/search#

GET /collections/{collection_id}/graph/context#

GET /collections/{collection_id}/graph/stats#

GET /collections/{collection_id}/graph/clusters#

GET /collections/{collection_id}/graph/changes#

GET /collections/{collection_id}/graph/events#

GET /collections/{collection_id}/graph/export#

POST /collections/{collection_id}/graph/import#

POST /collections/{collection_id}/graph/re-extract#

GET /collections/{collection_id}/graph/re-extract#