---
summary: "Every ScaiMatrix endpoint \u2014 collections, documents, ingestion, search,\
  \ graph, crawls, ACLs, saved views."
title: API reference
path: reference/api
status: published
---

# API reference

All endpoints are mounted at `/v1/modules/scaimatrix/` and authenticate with the standard ScaiGrid bearer token (`Authorization: Bearer sgk_...` or a JWT issued by ScaiKey). Responses use ScaiGrid's standard envelope (`{ "data": ... }` for success, `{ "error": ... }` for failures, `{ "data": [...], "meta": { "next_cursor": ..., "has_more": ... } }` for paginated lists).

All endpoints below require the appropriate module permission (see [Permissions](./permissions)) **and** the appropriate per-resource ACE (see [ACLs](../concepts/acls)).

## Collections

### `POST /collections`

Create a collection.

| Field | Required | Notes |
|---|---|---|
| `name` | yes | Human-readable name. |
| `description` | no | Free text. |
| `embedding_model` | yes | Frontend model id available to your tenant. |
| `chunking_strategy` | no | `fixed`, `paragraph` (default), `semantic`, `markdown`, `code`. |
| `chunk_size` | no | Default 512. |
| `chunk_overlap` | no | Default 50. |
| `graph_enabled` | no | Default false. |
| `graph_extraction_model` | no | Required if `graph_enabled`. A chat model. |
| `graph_max_nodes` | no | Default 250,000. NULL = unlimited. |
| `graph_max_edges` | no | Default 500,000. NULL = unlimited. |
| `storage_quota_bytes` | no | NULL = unlimited. |
| `default_access` | no | `tenant` (default) or `restricted`. |
| `metadata` | no | Arbitrary JSON. |

Slug is auto-generated from the name and must be unique within the tenant. Returns `201` with the created collection.

### `GET /collections`

List collections accessible to the caller. Cursor-paginated (`limit`, `cursor` query params). Returns each collection with a `user_permission` field (`read`, `write`, `manage`, or null).

### `GET /collections/{collection_id}`

Fetch one collection's full config and counters.

### `PUT /collections/{collection_id}`

Update mutable fields. Setting `graph_enabled` false -> true queues a re-extract over existing documents (signalled by `follow_ups.graph_reextract: "queued"` in the response). Changing chunking parameters sets `follow_ups.rechunk_recommended: true` without auto-queuing — call `/re-chunk` when you're ready.

### `DELETE /collections/{collection_id}`

Hard-delete. Cascades to documents, chunks (in Weaviate), graph contents (in Neo4j), ACLs, and crawl configs.

### `POST /collections/{collection_id}/reindex`

Re-queue ingestion for every document in the collection. Useful after a backend storage migration.

### `POST /collections/{collection_id}/fork`

Clone the collection's metadata + ACLs + config into a new collection. Documents are **not** copied. Use this when the embedding model needs to change.

```json
{
  "name": "...",
  "embedding_model": "openai/text-embedding-3-large",
  "copy_acls": true,
  "copy_metadata": true
}
```

### `POST /collections/{collection_id}/re-chunk`

Queue an in-place re-chunk over every indexed document using the collection's current chunking parameters and embedding model. Idempotent: a second call while one is running returns `409` with the existing job's status.

### `GET /collections/{collection_id}/re-chunk`

Current / last re-chunk job status (counters: total, processed, failed, started_at, finished_at).

## Documents

### `POST /collections/{collection_id}/documents`

Upload a document. `multipart/form-data` with `file` (binary) and optional `metadata` (JSON string). Returns the new document with `status: pending`; a background job indexes it.

### `POST /collections/{collection_id}/documents/bulk`

Upload many documents in one request — `files` field repeated, each entry one file. Each document gets its own ingestion job. Use this for warming a collection from a directory.

### `POST /collections/{collection_id}/documents/from-url`

Ingest from an external URL. Body: `{ "url": "...", "name": "...", "metadata": {...} }`.

### `POST /collections/{collection_id}/documents/from-scaidrive`

Ingest from a ScaiDrive file id. Body: `{ "file_id": "...", "name": "...", "metadata": {...} }`.

### `GET /collections/{collection_id}/documents`

Cursor-paginated list. Each row passes through the ACL chokepoint — documents the caller can't read are dropped silently.

### `GET /collections/{collection_id}/documents/{document_id}`

Fetch one document's metadata and status. ACL-gated; if the user can't read it, the response is the standard 403 `COLLECTION_ACCESS_DENIED` rather than a 404 leaking existence (note: equivalent in effect — listing won't surface it either).

### `DELETE /collections/{collection_id}/documents/{document_id}`

Remove a document. Drops chunks from the vector store and contributions from the graph store. Blob in S3 is cleaned up asynchronously.

## Search

### `POST /search`

Global search across every collection the caller can read.

| Field | Notes |
|---|---|
| `query` | The query text. |
| `collections` | Optional list of collection ids or slugs. Omit to span everything. |
| `top_k` | Default 10. |
| `min_score` | Default 0.0. |
| `search_type` | `vector` (default), `hybrid`, or `keyword`. |
| `filters` | Optional metadata filter object. |
| `include_content` | Default true. |
| `include_metadata` | Default true. |

Returns `{ "results": [...], "total": N }`. Each result has `chunk_id`, `document_id`, `document_name`, `collection_id`, `content`, `score`, `metadata`.

### `POST /collections/{collection_id}/search`

Same body as global search, scoped to one collection. The collection's `embedding_model` is used regardless of any model passed.

### `POST /collections/{collection_id}/search/combined`

Vector hits + graph traversal seeded from those hits.

| Field | Notes |
|---|---|
| `query` | The query text. |
| `vector_top_k` | Default 5. |
| `graph_depth` | Default 2. |
| `graph_expand_labels` | Optional list of node labels to expand. |
| `include_content` | Default true. |
| `min_score` | Default 0.0. |

Returns both the vector hits and the connected sub-graph, ACL-gated end-to-end.

## Graph

All graph endpoints are scoped to a collection that has `graph_enabled: true`. Calls against graph-disabled collections return empty results (no 422 — quieter UX).

### `GET /collections/{collection_id}/graph/nodes`

List nodes. Query params: `label`, `skip`, `limit` (max 100). Returns `items`, `total`, and `visible_count` (post-ACL).

### `GET /collections/{collection_id}/graph/nodes/{node_id}`

One node. ACL-gated through `filter_graph_results_by_acl`.

### `POST /collections/{collection_id}/graph/nodes`

Create a node manually. Body: `{ "label": "Product", "name": "WidgetPro", "properties": {...} }`. Subject to `graph_max_nodes` quota.

### `PUT /collections/{collection_id}/graph/nodes/{node_id}`

Update name and/or properties.

### `DELETE /collections/{collection_id}/graph/nodes/{node_id}`

Delete a node. Connected edges are removed.

### `GET /collections/{collection_id}/graph/edges`

List edges. Query params: `skip`, `limit`.

### `POST /collections/{collection_id}/graph/edges`

Create an edge. Body: `{ "source_node_id": "...", "target_node_id": "...", "relationship_type": "compatible_with", "properties": {...} }`. Subject to `graph_max_edges` quota.

### `DELETE /collections/{collection_id}/graph/edges/{edge_id}`

Remove an edge.

### `POST /collections/{collection_id}/graph/query`

Run a parameterised read-only Cypher-ish query against the collection's subgraph.

```json
{ "query": "MATCH (p:Product)-[:USES]->(t) WHERE t.name = $tech RETURN p", "parameters": {"tech": "Postgres"} }
```

Returns `nodes`, `edges`, `query_time_ms`.

### `POST /collections/{collection_id}/graph/traverse`

BFS from a node. Body: `{ "start_node_id": "...", "depth": 2, "labels": ["Product", "Feature"] }`.

### `POST /collections/{collection_id}/graph/path`

Shortest path between two nodes. Body: `{ "source_node_id": "...", "target_node_id": "...", "max_depth": 6 }`. Returns the path nodes / edges, `path_length`, and `found: bool`. If any intermediate node is ACL-denied, the entire path is dropped (no partial reveals).

### `POST /collections/{collection_id}/graph/ask`

Natural-language graph question. The configured chat model generates Cypher; mutation keywords are rejected; the read-only Cypher runs through the standard chokepoint.

```json
{ "question": "Which products are compatible with WidgetPro?", "model": "scailabs/poolnoodle-omni" }
```

Returns the generated `cypher` (for transparency), `nodes`, `edges`, `query_time_ms`, `model_used`.

### `GET /collections/{collection_id}/graph/search`

Substring search over node names. Query params: `q`, `limit`.

### `GET /collections/{collection_id}/graph/context`

Curated subgraph formatted as Markdown for LLM prompts. Query params: `label`, `max_nodes`, `max_chars` (default 8000). Returns `nodes`, `edges`, `formatted_text`, `truncated`.

### `GET /collections/{collection_id}/graph/stats`

Aggregate counts: nodes, edges, label distribution, relationship-type distribution, isolated nodes, per-document extraction status (total / extracted / failed / pending).

### `GET /collections/{collection_id}/graph/clusters`

One cluster per node label with counts and sample most-connected node ids. Designed for the virtualised large-graph admin view. Query param: `sample_per_cluster` (default 5).

### `GET /collections/{collection_id}/graph/changes`

Adds-only diff since a cutoff timestamp. Query params: `since` (ISO-8601, required), `limit` (default 500). Returns `nodes`, `edges`, `truncated`.

### `GET /collections/{collection_id}/graph/events`

Server-Sent Events stream of live graph mutations for the collection. Events: `scaimatrix.graph.node_created`, `node_updated`, `node_deleted`, `edge_created`, `edge_deleted`. Best-effort delivery; clients reload on connect.

### `GET /collections/{collection_id}/graph/export`

NDJSON dump of every visible node and edge for backup or migration. One JSON object per line: `{"type":"node","data":{...}}`, `{"type":"edge","data":{...}}`, terminating `{"type":"summary","data":{"nodes":N,"edges":M}}`. ACL-gated end-to-end. Streams so large graphs don't materialise in memory.

### `POST /collections/{collection_id}/graph/import`

Bulk-import a JSON dump.

```json
{ "nodes": [...], "edges": [...] }
```

Capped at 5,000 nodes + 5,000 edges per call. Duplicates by id are skipped (idempotent). Returns `nodes_created`, `nodes_skipped`, `edges_created`, `edges_skipped`.

### `POST /collections/{collection_id}/graph/re-extract`

Queue a graph re-extract over the collection's indexed documents. Idempotent: a second call while running returns `409` with status.

### `GET /collections/{collection_id}/graph/re-extract`

Current / last re-extract job status.

## Crawl

### `POST /collections/{collection_id}/crawl`

Start an ad-hoc crawl. Body: `{ "url": "...", "max_depth": 3, "max_pages": 100, "max_total_bytes": 52428800, "follow_external": false }`. Returns a `CrawlJobRead`.

### `GET /collections/{collection_id}/crawl/{crawl_id}`

Fetch one crawl job.

### `DELETE /collections/{collection_id}/crawl/{crawl_id}`

Cancel a running crawl (sets `status: cancelled`).

### `GET /collections/{collection_id}/crawl/{crawl_id}/stream`

SSE stream of progress events. `progress` every two seconds, terminal `done` when the job ends.

## Crawl configs

### `POST /collections/{collection_id}/crawls`

Create a recurring crawl config. Body fields: `name`, `seed_url`, `max_depth`, `max_pages`, `max_total_bytes`, `follow_external`, optional `schedule` (`{type: "daily", time: "03:00"}`), optional `webhook` (`{enabled: true}`). If `webhook.enabled` is set, the response includes a one-time `webhook_secret`.

### `GET /collections/{collection_id}/crawls`

List crawl configs for a collection (paginated).

### `GET /collections/{collection_id}/crawls/{config_id}`

Fetch one config.

### `PUT /collections/{collection_id}/crawls/{config_id}`

Update fields. Pass `clear_schedule: true` or `clear_webhook: true` to drop them entirely.

### `DELETE /collections/{collection_id}/crawls/{config_id}`

Remove a config. Run history is retained.

### `POST /collections/{collection_id}/crawls/{config_id}/run`

Manual one-off run. Returns the new `CrawlJobRead`.

### `GET /collections/{collection_id}/crawls/{config_id}/jobs`

Paginated history of runs for one config.

### `POST /collections/{collection_id}/crawls/{config_id}/trigger`

Inbound HMAC-verified webhook. Headers: `X-Crawl-Signature` (hex HMAC-SHA256 of `timestamp + "." + body`), `X-Crawl-Timestamp` (unix seconds). No JWT — the secret is the auth. 401 on bad / missing signature; 404 if webhook isn't enabled.

### `GET /crawl-jobs/{job_id}`

Fetch a crawl job without knowing the collection (useful when wiring up status views).

### `GET /crawl-jobs/{job_id}/documents`

Paginated list of documents produced by a specific crawl job.

## Per-resource ACLs (v2)

Resource type is `collection` or `document`. All ACL endpoints require `CHANGE_PERMISSIONS` on the target unless otherwise noted.

### `GET /permissions/{resource_type}/{resource_id}/acl`

Return the ACL, including all ACEs and the resource's `owner_user_id`. Caller needs `READ_PERMISSIONS`.

### `PATCH /permissions/{resource_type}/{resource_id}/acl`

Toggle `inherit_from_parent`. Body: `{ "inherit_from_parent": false }`.

### `POST /permissions/{resource_type}/{resource_id}/acl/entries`

Add an ACE. Body:

| Field | Notes |
|---|---|
| `principal_type` | `user` or `group`. |
| `principal_id` | ScaiKey id. |
| `ace_type` | `allow` (default) or `deny`. |
| `permissions` | Integer bitmask (see [ACLs](../concepts/acls)). |
| `inherit_to_children` | Default true. |

Returns the created ACE. `INGEST` on a document resource is rejected (`INVALID_ACE`).

### `DELETE /permissions/{resource_type}/{resource_id}/acl/entries/{ace_id}`

Remove an ACE. Returns `204`.

### `GET /permissions/{resource_type}/{resource_id}/effective`

Compute the calling user's effective permission bitmask. Returns `permissions`, `permission_names`, and a `can` object with named booleans for each verb.

### `POST /permissions/{resource_type}/{resource_id}/take-ownership`

Reassign the owner. Body: `{ "new_owner_user_id": "usr_..." }`. Caller needs `TAKE_OWNERSHIP`. Audit-logged.

## Legacy collection access (v1)

These pre-date v2 ACLs. New integrations should use the per-resource endpoints above.

- `GET /collections/{collection_id}/access`
- `POST /collections/{collection_id}/access` — `{ "grantee_type": "user|group", "grantee_id": "...", "permission": "read|write|manage" }`
- `PUT /collections/{collection_id}/access/{access_id}` — `{ "permission": "..." }`
- `DELETE /collections/{collection_id}/access/{access_id}`

## Saved graph views

Each view is a (filter + tool + tint) snapshot the viewer can round-trip. `private` views are visible only to the owner; `tenant` views are visible to everyone with read on the collection.

### `GET /collections/{collection_id}/graph/views`

List views the caller can see (their own private + every tenant-scoped one).

### `POST /collections/{collection_id}/graph/views`

Create. Body: `{ "name": "...", "description": "...", "scope": "private|tenant", "config": {...} }`. `tenant` scope requires collection write.

### `GET /collections/{collection_id}/graph/views/{view_id}`

Fetch one view.

### `PATCH /collections/{collection_id}/graph/views/{view_id}`

Update. Promoting `private -> tenant` requires collection write.

### `DELETE /collections/{collection_id}/graph/views/{view_id}`

Remove. Owner or collection-write.

## Errors

All endpoints return ScaiGrid's standard error envelope:

```json
{
  "error": {
    "code": "COLLECTION_ACCESS_DENIED",
    "message": "You do not have access to this collection"
  },
  "meta": { "request_id": "req_..." }
}
```

ScaiMatrix-specific codes:

| Code | Meaning |
|---|---|
| `COLLECTION_ACCESS_DENIED` | The user can't access this collection at the required permission level. |
| `COLLECTION_ACCESS_NOT_FOUND` | Legacy access grant id not found. |
| `INVALID_ACCESS_GRANT` | Legacy grant body was malformed. |
| `STORAGE_QUOTA_EXCEEDED` | Adding this document would push the collection past its `storage_quota_bytes`. |
| `GRAPH_QUOTA_EXCEEDED` | Adding these nodes / edges would breach `graph_max_nodes` / `graph_max_edges`. |
| `GRAPH_NOT_ENABLED` | The collection has `graph_enabled: false`. |
| `GRAPH_REEXTRACT_ALREADY_RUNNING` | A re-extract is already in flight; check the status endpoint. |
| `GRAPH_IMPORT_TOO_LARGE` | Payload exceeded the 5,000 + 5,000 cap; chunk client-side. |
| `GRAPH_IMPORT_INVALID_SHAPE` | Body wasn't `{ "nodes": [...], "edges": [...] }`. |
| `GRAPH_VIEW_INVALID_SCOPE` | Scope wasn't `private` or `tenant`. |
| `GRAPH_VIEW_NOT_FOUND` | View id missing or not visible to the caller. |
| `RECHUNK_ALREADY_RUNNING` | A re-chunk is already in flight. |
| `CRAWL_JOB_NOT_FOUND` | Crawl job id missing or wrong tenant. |
| `CRAWL_CONFIG_NOT_FOUND` | Crawl config missing, disabled, or webhook misconfigured. |
| `CRAWL_ALREADY_RUNNING` | A crawl is already in flight for this collection. |
| `INVALID_SCHEDULE_CONFIG` | Schedule fields didn't validate. |
| `INVALID_CRON_EXPRESSION` | Cron string didn't parse. |
| `ACL_NOT_FOUND` | Target resource or ACL not visible to caller. |
| `ACE_NOT_FOUND` | ACE id missing or doesn't belong to this resource. |
| `INVALID_ACE` | Bad principal_type / ace_type / permissions combination. |
