Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

API reference

All endpoints are mounted at /v1/modules/scaimatrix/ and authenticate with the standard ScaiGrid bearer token (Authorization: Bearer sgk_... or a JWT issued by ScaiKey). Responses use ScaiGrid's standard envelope ({ "data": ... } for success, { "error": ... } for failures, { "data": [...], "meta": { "next_cursor": ..., "has_more": ... } } for paginated lists).

All endpoints below require the appropriate module permission (see Permissions) and the appropriate per-resource ACE (see ACLs).

Collections#

POST /collections#

Create a collection.

Field Required Notes
name yes Human-readable name.
description no Free text.
embedding_model yes Frontend model id available to your tenant.
chunking_strategy no fixed, paragraph (default), semantic, markdown, code.
chunk_size no Default 512.
chunk_overlap no Default 50.
graph_enabled no Default false.
graph_extraction_model no Required if graph_enabled. A chat model.
graph_max_nodes no Default 250,000. NULL = unlimited.
graph_max_edges no Default 500,000. NULL = unlimited.
storage_quota_bytes no NULL = unlimited.
default_access no tenant (default) or restricted.
metadata no Arbitrary JSON.

Slug is auto-generated from the name and must be unique within the tenant. Returns 201 with the created collection.

GET /collections#

List collections accessible to the caller. Cursor-paginated (limit, cursor query params). Returns each collection with a user_permission field (read, write, manage, or null).

GET /collections/{collection_id}#

Fetch one collection's full config and counters.

PUT /collections/{collection_id}#

Update mutable fields. Setting graph_enabled false -> true queues a re-extract over existing documents (signalled by follow_ups.graph_reextract: "queued" in the response). Changing chunking parameters sets follow_ups.rechunk_recommended: true without auto-queuing — call /re-chunk when you're ready.

DELETE /collections/{collection_id}#

Hard-delete. Cascades to documents, chunks (in Weaviate), graph contents (in Neo4j), ACLs, and crawl configs.

POST /collections/{collection_id}/reindex#

Re-queue ingestion for every document in the collection. Useful after a backend storage migration.

POST /collections/{collection_id}/fork#

Clone the collection's metadata + ACLs + config into a new collection. Documents are not copied. Use this when the embedding model needs to change.

json
1
2
3
4
5
6
{
  "name": "...",
  "embedding_model": "openai/text-embedding-3-large",
  "copy_acls": true,
  "copy_metadata": true
}

POST /collections/{collection_id}/re-chunk#

Queue an in-place re-chunk over every indexed document using the collection's current chunking parameters and embedding model. Idempotent: a second call while one is running returns 409 with the existing job's status.

GET /collections/{collection_id}/re-chunk#

Current / last re-chunk job status (counters: total, processed, failed, started_at, finished_at).

Documents#

POST /collections/{collection_id}/documents#

Upload a document. multipart/form-data with file (binary) and optional metadata (JSON string). Returns the new document with status: pending; a background job indexes it.

POST /collections/{collection_id}/documents/bulk#

Upload many documents in one request — files field repeated, each entry one file. Each document gets its own ingestion job. Use this for warming a collection from a directory.

POST /collections/{collection_id}/documents/from-url#

Ingest from an external URL. Body: { "url": "...", "name": "...", "metadata": {...} }.

POST /collections/{collection_id}/documents/from-scaidrive#

Ingest from a ScaiDrive file id. Body: { "file_id": "...", "name": "...", "metadata": {...} }.

GET /collections/{collection_id}/documents#

Cursor-paginated list. Each row passes through the ACL chokepoint — documents the caller can't read are dropped silently.

GET /collections/{collection_id}/documents/{document_id}#

Fetch one document's metadata and status. ACL-gated; if the user can't read it, the response is the standard 403 COLLECTION_ACCESS_DENIED rather than a 404 leaking existence (note: equivalent in effect — listing won't surface it either).

DELETE /collections/{collection_id}/documents/{document_id}#

Remove a document. Drops chunks from the vector store and contributions from the graph store. Blob in S3 is cleaned up asynchronously.

POST /search#

Global search across every collection the caller can read.

Field Notes
query The query text.
collections Optional list of collection ids or slugs. Omit to span everything.
top_k Default 10.
min_score Default 0.0.
search_type vector (default), hybrid, or keyword.
filters Optional metadata filter object.
include_content Default true.
include_metadata Default true.

Returns { "results": [...], "total": N }. Each result has chunk_id, document_id, document_name, collection_id, content, score, metadata.

POST /collections/{collection_id}/search#

Same body as global search, scoped to one collection. The collection's embedding_model is used regardless of any model passed.

POST /collections/{collection_id}/search/combined#

Vector hits + graph traversal seeded from those hits.

Field Notes
query The query text.
vector_top_k Default 5.
graph_depth Default 2.
graph_expand_labels Optional list of node labels to expand.
include_content Default true.
min_score Default 0.0.

Returns both the vector hits and the connected sub-graph, ACL-gated end-to-end.

Graph#

All graph endpoints are scoped to a collection that has graph_enabled: true. Calls against graph-disabled collections return empty results (no 422 — quieter UX).

GET /collections/{collection_id}/graph/nodes#

List nodes. Query params: label, skip, limit (max 100). Returns items, total, and visible_count (post-ACL).

GET /collections/{collection_id}/graph/nodes/{node_id}#

One node. ACL-gated through filter_graph_results_by_acl.

POST /collections/{collection_id}/graph/nodes#

Create a node manually. Body: { "label": "Product", "name": "WidgetPro", "properties": {...} }. Subject to graph_max_nodes quota.

PUT /collections/{collection_id}/graph/nodes/{node_id}#

Update name and/or properties.

DELETE /collections/{collection_id}/graph/nodes/{node_id}#

Delete a node. Connected edges are removed.

GET /collections/{collection_id}/graph/edges#

List edges. Query params: skip, limit.

POST /collections/{collection_id}/graph/edges#

Create an edge. Body: { "source_node_id": "...", "target_node_id": "...", "relationship_type": "compatible_with", "properties": {...} }. Subject to graph_max_edges quota.

DELETE /collections/{collection_id}/graph/edges/{edge_id}#

Remove an edge.

POST /collections/{collection_id}/graph/query#

Run a parameterised read-only Cypher-ish query against the collection's subgraph.

json
1
{ "query": "MATCH (p:Product)-[:USES]->(t) WHERE t.name = $tech RETURN p", "parameters": {"tech": "Postgres"} }

Returns nodes, edges, query_time_ms.

POST /collections/{collection_id}/graph/traverse#

BFS from a node. Body: { "start_node_id": "...", "depth": 2, "labels": ["Product", "Feature"] }.

POST /collections/{collection_id}/graph/path#

Shortest path between two nodes. Body: { "source_node_id": "...", "target_node_id": "...", "max_depth": 6 }. Returns the path nodes / edges, path_length, and found: bool. If any intermediate node is ACL-denied, the entire path is dropped (no partial reveals).

POST /collections/{collection_id}/graph/ask#

Natural-language graph question. The configured chat model generates Cypher; mutation keywords are rejected; the read-only Cypher runs through the standard chokepoint.

json
1
{ "question": "Which products are compatible with WidgetPro?", "model": "scailabs/poolnoodle-omni" }

Returns the generated cypher (for transparency), nodes, edges, query_time_ms, model_used.

GET /collections/{collection_id}/graph/search#

Substring search over node names. Query params: q, limit.

GET /collections/{collection_id}/graph/context#

Curated subgraph formatted as Markdown for LLM prompts. Query params: label, max_nodes, max_chars (default 8000). Returns nodes, edges, formatted_text, truncated.

GET /collections/{collection_id}/graph/stats#

Aggregate counts: nodes, edges, label distribution, relationship-type distribution, isolated nodes, per-document extraction status (total / extracted / failed / pending).

GET /collections/{collection_id}/graph/clusters#

One cluster per node label with counts and sample most-connected node ids. Designed for the virtualised large-graph admin view. Query param: sample_per_cluster (default 5).

GET /collections/{collection_id}/graph/changes#

Adds-only diff since a cutoff timestamp. Query params: since (ISO-8601, required), limit (default 500). Returns nodes, edges, truncated.

GET /collections/{collection_id}/graph/events#

Server-Sent Events stream of live graph mutations for the collection. Events: scaimatrix.graph.node_created, node_updated, node_deleted, edge_created, edge_deleted. Best-effort delivery; clients reload on connect.

GET /collections/{collection_id}/graph/export#

NDJSON dump of every visible node and edge for backup or migration. One JSON object per line: {"type":"node","data":{...}}, {"type":"edge","data":{...}}, terminating {"type":"summary","data":{"nodes":N,"edges":M}}. ACL-gated end-to-end. Streams so large graphs don't materialise in memory.

POST /collections/{collection_id}/graph/import#

Bulk-import a JSON dump.

json
1
{ "nodes": [...], "edges": [...] }

Capped at 5,000 nodes + 5,000 edges per call. Duplicates by id are skipped (idempotent). Returns nodes_created, nodes_skipped, edges_created, edges_skipped.

POST /collections/{collection_id}/graph/re-extract#

Queue a graph re-extract over the collection's indexed documents. Idempotent: a second call while running returns 409 with status.

GET /collections/{collection_id}/graph/re-extract#

Current / last re-extract job status.

Crawl#

POST /collections/{collection_id}/crawl#

Start an ad-hoc crawl. Body: { "url": "...", "max_depth": 3, "max_pages": 100, "max_total_bytes": 52428800, "follow_external": false }. Returns a CrawlJobRead.

GET /collections/{collection_id}/crawl/{crawl_id}#

Fetch one crawl job.

DELETE /collections/{collection_id}/crawl/{crawl_id}#

Cancel a running crawl (sets status: cancelled).

GET /collections/{collection_id}/crawl/{crawl_id}/stream#

SSE stream of progress events. progress every two seconds, terminal done when the job ends.

Crawl configs#

POST /collections/{collection_id}/crawls#

Create a recurring crawl config. Body fields: name, seed_url, max_depth, max_pages, max_total_bytes, follow_external, optional schedule ({type: "daily", time: "03:00"}), optional webhook ({enabled: true}). If webhook.enabled is set, the response includes a one-time webhook_secret.

GET /collections/{collection_id}/crawls#

List crawl configs for a collection (paginated).

GET /collections/{collection_id}/crawls/{config_id}#

Fetch one config.

PUT /collections/{collection_id}/crawls/{config_id}#

Update fields. Pass clear_schedule: true or clear_webhook: true to drop them entirely.

DELETE /collections/{collection_id}/crawls/{config_id}#

Remove a config. Run history is retained.

POST /collections/{collection_id}/crawls/{config_id}/run#

Manual one-off run. Returns the new CrawlJobRead.

GET /collections/{collection_id}/crawls/{config_id}/jobs#

Paginated history of runs for one config.

POST /collections/{collection_id}/crawls/{config_id}/trigger#

Inbound HMAC-verified webhook. Headers: X-Crawl-Signature (hex HMAC-SHA256 of timestamp + "." + body), X-Crawl-Timestamp (unix seconds). No JWT — the secret is the auth. 401 on bad / missing signature; 404 if webhook isn't enabled.

GET /crawl-jobs/{job_id}#

Fetch a crawl job without knowing the collection (useful when wiring up status views).

GET /crawl-jobs/{job_id}/documents#

Paginated list of documents produced by a specific crawl job.

Per-resource ACLs (v2)#

Resource type is collection or document. All ACL endpoints require CHANGE_PERMISSIONS on the target unless otherwise noted.

GET /permissions/{resource_type}/{resource_id}/acl#

Return the ACL, including all ACEs and the resource's owner_user_id. Caller needs READ_PERMISSIONS.

PATCH /permissions/{resource_type}/{resource_id}/acl#

Toggle inherit_from_parent. Body: { "inherit_from_parent": false }.

POST /permissions/{resource_type}/{resource_id}/acl/entries#

Add an ACE. Body:

Field Notes
principal_type user or group.
principal_id ScaiKey id.
ace_type allow (default) or deny.
permissions Integer bitmask (see ACLs).
inherit_to_children Default true.

Returns the created ACE. INGEST on a document resource is rejected (INVALID_ACE).

DELETE /permissions/{resource_type}/{resource_id}/acl/entries/{ace_id}#

Remove an ACE. Returns 204.

GET /permissions/{resource_type}/{resource_id}/effective#

Compute the calling user's effective permission bitmask. Returns permissions, permission_names, and a can object with named booleans for each verb.

POST /permissions/{resource_type}/{resource_id}/take-ownership#

Reassign the owner. Body: { "new_owner_user_id": "usr_..." }. Caller needs TAKE_OWNERSHIP. Audit-logged.

Legacy collection access (v1)#

These pre-date v2 ACLs. New integrations should use the per-resource endpoints above.

  • GET /collections/{collection_id}/access
  • POST /collections/{collection_id}/access{ "grantee_type": "user|group", "grantee_id": "...", "permission": "read|write|manage" }
  • PUT /collections/{collection_id}/access/{access_id}{ "permission": "..." }
  • DELETE /collections/{collection_id}/access/{access_id}

Saved graph views#

Each view is a (filter + tool + tint) snapshot the viewer can round-trip. private views are visible only to the owner; tenant views are visible to everyone with read on the collection.

GET /collections/{collection_id}/graph/views#

List views the caller can see (their own private + every tenant-scoped one).

POST /collections/{collection_id}/graph/views#

Create. Body: { "name": "...", "description": "...", "scope": "private|tenant", "config": {...} }. tenant scope requires collection write.

GET /collections/{collection_id}/graph/views/{view_id}#

Fetch one view.

PATCH /collections/{collection_id}/graph/views/{view_id}#

Update. Promoting private -> tenant requires collection write.

DELETE /collections/{collection_id}/graph/views/{view_id}#

Remove. Owner or collection-write.

Errors#

All endpoints return ScaiGrid's standard error envelope:

json
1
2
3
4
5
6
7
{
  "error": {
    "code": "COLLECTION_ACCESS_DENIED",
    "message": "You do not have access to this collection"
  },
  "meta": { "request_id": "req_..." }
}

ScaiMatrix-specific codes:

Code Meaning
COLLECTION_ACCESS_DENIED The user can't access this collection at the required permission level.
COLLECTION_ACCESS_NOT_FOUND Legacy access grant id not found.
INVALID_ACCESS_GRANT Legacy grant body was malformed.
STORAGE_QUOTA_EXCEEDED Adding this document would push the collection past its storage_quota_bytes.
GRAPH_QUOTA_EXCEEDED Adding these nodes / edges would breach graph_max_nodes / graph_max_edges.
GRAPH_NOT_ENABLED The collection has graph_enabled: false.
GRAPH_REEXTRACT_ALREADY_RUNNING A re-extract is already in flight; check the status endpoint.
GRAPH_IMPORT_TOO_LARGE Payload exceeded the 5,000 + 5,000 cap; chunk client-side.
GRAPH_IMPORT_INVALID_SHAPE Body wasn't { "nodes": [...], "edges": [...] }.
GRAPH_VIEW_INVALID_SCOPE Scope wasn't private or tenant.
GRAPH_VIEW_NOT_FOUND View id missing or not visible to the caller.
RECHUNK_ALREADY_RUNNING A re-chunk is already in flight.
CRAWL_JOB_NOT_FOUND Crawl job id missing or wrong tenant.
CRAWL_CONFIG_NOT_FOUND Crawl config missing, disabled, or webhook misconfigured.
CRAWL_ALREADY_RUNNING A crawl is already in flight for this collection.
INVALID_SCHEDULE_CONFIG Schedule fields didn't validate.
INVALID_CRON_EXPRESSION Cron string didn't parse.
ACL_NOT_FOUND Target resource or ACL not visible to caller.
ACE_NOT_FOUND ACE id missing or doesn't belong to this resource.
INVALID_ACE Bad principal_type / ace_type / permissions combination.
Updated 2026-05-18 15:01:30 View source (.md) rev 12