Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Quickstart

In five minutes you'll have a collection indexed from one document and a working semantic search hitting it.

You need:

  • A ScaiGrid API key with scaimatrix:manage and scaimatrix:ingest (any tenant admin has these).
  • A PDF, Markdown, or text file to ingest.
  • An embedding model your tenant can call (any model registered as kind: embeddingsopenai/text-embedding-3-small is a common default).
bash
1
2
export SCAIGRID_HOST="https://scaigrid.scailabs.ai"
export SCAIGRID_API_KEY="sgk_..."

1. Create a collection#

bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
curl -X POST "$SCAIGRID_HOST/v1/modules/scaimatrix/collections" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Quickstart Docs",
    "description": "Demo collection",
    "embedding_model": "openai/text-embedding-3-small",
    "chunking_strategy": "paragraph",
    "chunk_size": 512,
    "chunk_overlap": 50,
    "default_access": "tenant"
  }'
python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import httpx, os
coll = httpx.post(
    f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaimatrix/collections",
    headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    json={
        "name": "Quickstart Docs",
        "description": "Demo collection",
        "embedding_model": "openai/text-embedding-3-small",
        "chunking_strategy": "paragraph",
        "chunk_size": 512,
        "chunk_overlap": 50,
        "default_access": "tenant",
    },
).json()["data"]
print(coll["id"])
javascript
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
const res = await fetch(`${process.env.SCAIGRID_HOST}/v1/modules/scaimatrix/collections`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    name: "Quickstart Docs",
    description: "Demo collection",
    embedding_model: "openai/text-embedding-3-small",
    chunking_strategy: "paragraph",
    chunk_size: 512,
    chunk_overlap: 50,
    default_access: "tenant",
  }),
});
const { data: coll } = await res.json();
console.log(coll.id);

Save the returned coll.id. default_access: "tenant" means everyone in the tenant gets implicit read; use restricted if you want to opt-in via explicit ACEs only.

2. Upload a document#

bash
1
2
3
4
curl -X POST "$SCAIGRID_HOST/v1/modules/scaimatrix/collections/$COLLECTION_ID/documents" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "file=@handbook.pdf" \
  -F 'metadata={"department":"engineering"}'
python
1
2
3
4
5
6
7
8
with open("handbook.pdf", "rb") as f:
    doc = httpx.post(
        f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaimatrix/collections/{coll['id']}/documents",
        headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
        files={"file": ("handbook.pdf", f, "application/pdf")},
        data={"metadata": '{"department":"engineering"}'},
    ).json()["data"]
print(doc["id"], doc["status"])
javascript
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
const fd = new FormData();
fd.append("file", new Blob([fileBytes], { type: "application/pdf" }), "handbook.pdf");
fd.append("metadata", JSON.stringify({ department: "engineering" }));
const res = await fetch(
  `${process.env.SCAIGRID_HOST}/v1/modules/scaimatrix/collections/${coll.id}/documents`,
  {
    method: "POST",
    headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` },
    body: fd,
  },
);
const { data: doc } = await res.json();
console.log(doc.id, doc.status);

Indexing is async. The document's status walks through pending -> processing -> indexed (or failed). For a handful of pages this is seconds.

3. Poll until indexed#

bash
1
2
curl "$SCAIGRID_HOST/v1/modules/scaimatrix/collections/$COLLECTION_ID/documents/$DOC_ID" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"
python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import time
while True:
    d = httpx.get(
        f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaimatrix/collections/{coll['id']}/documents/{doc['id']}",
        headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    ).json()["data"]
    if d["status"] == "indexed":
        break
    if d["status"] == "failed":
        raise RuntimeError(d.get("error_message"))
    time.sleep(2)
javascript
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
while (true) {
  const r = await fetch(
    `${process.env.SCAIGRID_HOST}/v1/modules/scaimatrix/collections/${coll.id}/documents/${doc.id}`,
    { headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` } },
  );
  const { data: d } = await r.json();
  if (d.status === "indexed") break;
  if (d.status === "failed") throw new Error(d.error_message);
  await new Promise((res) => setTimeout(res, 2000));
}

4. Search the collection#

bash
1
2
3
4
5
6
7
8
curl -X POST "$SCAIGRID_HOST/v1/modules/scaimatrix/collections/$COLLECTION_ID/search" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I reset my password?",
    "top_k": 5,
    "search_type": "hybrid"
  }'
python
1
2
3
4
5
6
7
hits = httpx.post(
    f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaimatrix/collections/{coll['id']}/search",
    headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    json={"query": "How do I reset my password?", "top_k": 5, "search_type": "hybrid"},
).json()["data"]
for r in hits["results"]:
    print(f"[{r['score']:.2f}] {r['document_name']}: {r['content'][:120]}...")
javascript
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
const r = await fetch(
  `${process.env.SCAIGRID_HOST}/v1/modules/scaimatrix/collections/${coll.id}/search`,
  {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ query: "How do I reset my password?", top_k: 5, search_type: "hybrid" }),
  },
);
const { data: hits } = await r.json();
hits.results.forEach((x) => console.log(x.score, x.document_name));

You get back ranked chunks with chunk_id, document_id, document_name, score, content, and metadata.

5. (Optional) Search across every collection you can read#

bash
1
2
3
4
curl -X POST "$SCAIGRID_HOST/v1/modules/scaimatrix/search" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "password reset", "top_k": 10}'

Omit collections to search every collection your user has read access to. Pass collections: ["col_a", "col_b"] (IDs or slugs) to scope it.

What just happened#

  • The collection row in MariaDB recorded your embedding model and chunking settings.
  • The PDF landed in S3 (via the document store). A background job extracted text, chunked it, and wrote one vector per chunk into the vector backend tagged with the collection's slug.
  • Your search query was embedded with the same model, matched against those vectors, then ACL-gated through ScaiMatrix's single chokepoint before the response was assembled.
  • No chunk whose source document you can't read can leak into the response — not in counts, not in metadata, not in graph traversals.

Next#

  • Turn on graph extraction by setting graph_enabled: true and graph_extraction_model: "scailabs/poolnoodle-omni" (or your preferred chat model) on the collection. See Architecture.
  • Lock down a single document with a deny ACE — see ACLs.
  • Schedule a recurring crawl — see Crawl on a schedule.
Updated 2026-05-18 15:01:30 View source (.md) rev 12