---
summary: "Create a collection, upload a document, run a search, inspect the result\
  \ \u2014 five minutes end-to-end."
title: Quickstart
path: quickstart
status: published
---

# Quickstart

In five minutes you'll have a collection indexed from one document and a working semantic search hitting it.

You need:

- A ScaiGrid API key with `scaimatrix:manage` and `scaimatrix:ingest` (any tenant admin has these).
- A PDF, Markdown, or text file to ingest.
- An embedding model your tenant can call (any model registered as `kind: embeddings` — `openai/text-embedding-3-small` is a common default).

```bash
export SCAIGRID_HOST="https://scaigrid.scailabs.ai"
export SCAIGRID_API_KEY="sgk_..."
```

## 1. Create a collection

```bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaimatrix/collections" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Quickstart Docs",
    "description": "Demo collection",
    "embedding_model": "openai/text-embedding-3-small",
    "chunking_strategy": "paragraph",
    "chunk_size": 512,
    "chunk_overlap": 50,
    "default_access": "tenant"
  }'
```

```python
import httpx, os
coll = httpx.post(
    f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaimatrix/collections",
    headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    json={
        "name": "Quickstart Docs",
        "description": "Demo collection",
        "embedding_model": "openai/text-embedding-3-small",
        "chunking_strategy": "paragraph",
        "chunk_size": 512,
        "chunk_overlap": 50,
        "default_access": "tenant",
    },
).json()["data"]
print(coll["id"])
```

```javascript
const res = await fetch(`${process.env.SCAIGRID_HOST}/v1/modules/scaimatrix/collections`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    name: "Quickstart Docs",
    description: "Demo collection",
    embedding_model: "openai/text-embedding-3-small",
    chunking_strategy: "paragraph",
    chunk_size: 512,
    chunk_overlap: 50,
    default_access: "tenant",
  }),
});
const { data: coll } = await res.json();
console.log(coll.id);
```

Save the returned `coll.id`. `default_access: "tenant"` means everyone in the tenant gets implicit read; use `restricted` if you want to opt-in via explicit ACEs only.

## 2. Upload a document

```bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaimatrix/collections/$COLLECTION_ID/documents" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "file=@handbook.pdf" \
  -F 'metadata={"department":"engineering"}'
```

```python
with open("handbook.pdf", "rb") as f:
    doc = httpx.post(
        f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaimatrix/collections/{coll['id']}/documents",
        headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
        files={"file": ("handbook.pdf", f, "application/pdf")},
        data={"metadata": '{"department":"engineering"}'},
    ).json()["data"]
print(doc["id"], doc["status"])
```

```javascript
const fd = new FormData();
fd.append("file", new Blob([fileBytes], { type: "application/pdf" }), "handbook.pdf");
fd.append("metadata", JSON.stringify({ department: "engineering" }));
const res = await fetch(
  `${process.env.SCAIGRID_HOST}/v1/modules/scaimatrix/collections/${coll.id}/documents`,
  {
    method: "POST",
    headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` },
    body: fd,
  },
);
const { data: doc } = await res.json();
console.log(doc.id, doc.status);
```

Indexing is async. The document's `status` walks through `pending` -> `processing` -> `indexed` (or `failed`). For a handful of pages this is seconds.

## 3. Poll until indexed

```bash
curl "$SCAIGRID_HOST/v1/modules/scaimatrix/collections/$COLLECTION_ID/documents/$DOC_ID" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"
```

```python
import time
while True:
    d = httpx.get(
        f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaimatrix/collections/{coll['id']}/documents/{doc['id']}",
        headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    ).json()["data"]
    if d["status"] == "indexed":
        break
    if d["status"] == "failed":
        raise RuntimeError(d.get("error_message"))
    time.sleep(2)
```

```javascript
while (true) {
  const r = await fetch(
    `${process.env.SCAIGRID_HOST}/v1/modules/scaimatrix/collections/${coll.id}/documents/${doc.id}`,
    { headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` } },
  );
  const { data: d } = await r.json();
  if (d.status === "indexed") break;
  if (d.status === "failed") throw new Error(d.error_message);
  await new Promise((res) => setTimeout(res, 2000));
}
```

## 4. Search the collection

```bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaimatrix/collections/$COLLECTION_ID/search" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I reset my password?",
    "top_k": 5,
    "search_type": "hybrid"
  }'
```

```python
hits = httpx.post(
    f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaimatrix/collections/{coll['id']}/search",
    headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    json={"query": "How do I reset my password?", "top_k": 5, "search_type": "hybrid"},
).json()["data"]
for r in hits["results"]:
    print(f"[{r['score']:.2f}] {r['document_name']}: {r['content'][:120]}...")
```

```javascript
const r = await fetch(
  `${process.env.SCAIGRID_HOST}/v1/modules/scaimatrix/collections/${coll.id}/search`,
  {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ query: "How do I reset my password?", top_k: 5, search_type: "hybrid" }),
  },
);
const { data: hits } = await r.json();
hits.results.forEach((x) => console.log(x.score, x.document_name));
```

You get back ranked chunks with `chunk_id`, `document_id`, `document_name`, `score`, `content`, and `metadata`.

## 5. (Optional) Search across every collection you can read

```bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaimatrix/search" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "password reset", "top_k": 10}'
```

Omit `collections` to search every collection your user has read access to. Pass `collections: ["col_a", "col_b"]` (IDs or slugs) to scope it.

## What just happened

- The collection row in MariaDB recorded your embedding model and chunking settings.
- The PDF landed in S3 (via the document store). A background job extracted text, chunked it, and wrote one vector per chunk into the vector backend tagged with the collection's slug.
- Your search query was embedded with the same model, matched against those vectors, then ACL-gated through ScaiMatrix's single chokepoint before the response was assembled.
- No chunk whose source document you can't read can leak into the response — not in counts, not in metadata, not in graph traversals.

## Next

- Turn on graph extraction by setting `graph_enabled: true` and `graph_extraction_model: "scailabs/poolnoodle-omni"` (or your preferred chat model) on the collection. See [Architecture](./concepts/architecture).
- Lock down a single document with a deny ACE — see [ACLs](./concepts/acls).
- Schedule a recurring crawl — see [Crawl on a schedule](./tutorials/crawl-on-a-schedule).
