Knowledge and RAG

A bot can answer from documents you upload (or from a shared ScaiMatrix collection you point it at). That's the retrieval-augmented-generation (RAG) loop, built in.

Two knowledge modes#

You choose per bot:

Managed — the bot owns its documents. Each upload creates chunks, embeddings, and an index slice scoped to that bot. Other bots can't see the documents. Good for: bot-specific content (a single product's FAQ, a single team's handbook).

Linked — the bot reads from a ScaiMatrix collection you've already set up. Many bots can share the same collection. Good for: corporate knowledge that powers multiple bots, content managed by a separate team using ScaiMatrix directly.

Set knowledge_mode on the bot:

json
{ "knowledge_mode": "managed" }

json
{ "knowledge_mode": "linked", "knowledge_collection_id": "col_shared_docs" }

What "indexed" means#

When you POST /bots/{id}/documents:

The file lands in object storage (ScaiDrive under the hood).
ScaiBot fans out a background task that:
- Extracts text (PDF / DOCX / HTML / Markdown / plain).
- Chunks at semantic boundaries (typically 400-600 tokens per chunk with 50-token overlap).
- Embeds each chunk with ScaiGrid's default embedding model.
- Writes chunks to ScaiMatrix tagged with the bot's collection id.
The document's status flips: uploaded → extracting → indexing → indexed (or failed).

For most documents under a few hundred pages, the whole pipeline completes in under a minute.

What's retrieved at chat time#

When the visitor sends a message:

The message is embedded with the same model used for chunks.
ScaiMatrix returns the top-K (default 5) chunks by hybrid score (BM25 keyword + cosine semantic).
Chunks below the relevance threshold are dropped.
The remaining chunks are stitched into the system prompt as labelled context.
The model is told to cite the chunk number it used for each statement.

Tune retrieval via the bot's knowledge_settings:

json
{
  "top_k": 5,
  "score_threshold": 0.3,
  "max_chunks_per_doc": 2,
  "deduplicate": true
}

max_chunks_per_doc prevents one document from monopolising retrieval when it has many near-identical sections (typical for FAQs).

Citations#

Every assistant message comes back with citations:

json
{
  "role": "assistant",
  "content": "Refunds are processed within 14 business days [^1].",
  "citations": [
    {
      "marker": "1",
      "document_id": "doc_abc",
      "document_name": "Refund Policy.pdf",
      "chunk_id": "chk_xyz",
      "snippet": "Refunds shall be remitted to the original payment instrument within fourteen (14) business days...",
      "score": 0.84
    }
  ]
}

The widget renders these as superscripts that expand to show the snippet.

Updating and removing documents#

bash
# Replace a document — same name, new file
curl -X PUT "$SCAIGRID_HOST/v1/modules/scaibot/bots/$BOT_ID/documents/$DOC_ID" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "file=@updated-handbook.pdf"

bash
# Remove a document — also drops its chunks from the index
curl -X DELETE "$SCAIGRID_HOST/v1/modules/scaibot/bots/$BOT_ID/documents/$DOC_ID" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"

Removals are immediate at the index level — the chunks vanish from retrieval. Object-storage cleanup happens asynchronously.

When to switch to linked mode#

Managed mode is the simplest path and works well for the first few hundred MB of content per bot.

Switch to linked mode when:

The same content powers multiple bots (one tenant, multiple deployments — internal-Slack-bot + public-help-bot answering from the same handbook).
A non-bot team owns the knowledge (the legal team manages a ScaiMatrix collection of contracts; bots only read it).
The corpus is too large for per-bot management (thousands of documents, terabytes of source material).
You need fine-grained access control on chunks (ScaiMatrix supports per-document ACLs; managed mode treats the whole bot uniformly).

Supported document types#

Type	Notes
PDF	Most common. Tables and footnotes preserved; column flow detected.
DOCX	Headings preserved as semantic boundaries.
Markdown	H1/H2/H3 used as boundaries.
HTML	Stripped of nav/script/style; `<main>` preferred when present.
Plain text	Chunked by paragraph and sentence.
JSON / YAML	Treated as plain text — structured retrieval is not yet supported.

For images of scanned PDFs (no text layer), OCR is performed automatically — quality varies with scan quality.

Limits#

Single-document max size: 50 MB.
Per-bot managed-knowledge cap: 5,000 documents (raise the limit through your account team for larger corpora).
Per-chunk retrieval: maximum 50 chunks at chat time (you should aim much lower).
Indexing timeout: 10 minutes per document. Larger documents are accepted but may need retry.