Knowledge and RAG
A bot can answer from documents you upload (or from a shared ScaiMatrix collection you point it at). That's the retrieval-augmented-generation (RAG) loop, built in.
Two knowledge modes#
You choose per bot:
Managed — the bot owns its documents. Each upload creates chunks, embeddings, and an index slice scoped to that bot. Other bots can't see the documents. Good for: bot-specific content (a single product's FAQ, a single team's handbook).
Linked — the bot reads from a ScaiMatrix collection you've already set up. Many bots can share the same collection. Good for: corporate knowledge that powers multiple bots, content managed by a separate team using ScaiMatrix directly.
Set knowledge_mode on the bot:
1 | |
1 | |
What "indexed" means#
When you POST /bots/{id}/documents:
- The file lands in object storage (ScaiDrive under the hood).
- ScaiBot fans out a background task that:
- Extracts text (PDF / DOCX / HTML / Markdown / plain).
- Chunks at semantic boundaries (typically 400-600 tokens per chunk with 50-token overlap).
- Embeds each chunk with ScaiGrid's default embedding model.
- Writes chunks to ScaiMatrix tagged with the bot's collection id.
- The document's
statusflips:uploaded→extracting→indexing→indexed(orfailed).
For most documents under a few hundred pages, the whole pipeline completes in under a minute.
What's retrieved at chat time#
When the visitor sends a message:
- The message is embedded with the same model used for chunks.
- ScaiMatrix returns the top-K (default 5) chunks by hybrid score (BM25 keyword + cosine semantic).
- Chunks below the relevance threshold are dropped.
- The remaining chunks are stitched into the system prompt as labelled context.
- The model is told to cite the chunk number it used for each statement.
Tune retrieval via the bot's knowledge_settings:
1 2 3 4 5 6 | |
max_chunks_per_doc prevents one document from monopolising retrieval when it has many near-identical sections (typical for FAQs).
Citations#
Every assistant message comes back with citations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
The widget renders these as superscripts that expand to show the snippet.
Updating and removing documents#
1 2 3 4 | |
1 2 3 | |
Removals are immediate at the index level — the chunks vanish from retrieval. Object-storage cleanup happens asynchronously.
When to switch to linked mode#
Managed mode is the simplest path and works well for the first few hundred MB of content per bot.
Switch to linked mode when:
- The same content powers multiple bots (one tenant, multiple deployments — internal-Slack-bot + public-help-bot answering from the same handbook).
- A non-bot team owns the knowledge (the legal team manages a ScaiMatrix collection of contracts; bots only read it).
- The corpus is too large for per-bot management (thousands of documents, terabytes of source material).
- You need fine-grained access control on chunks (ScaiMatrix supports per-document ACLs; managed mode treats the whole bot uniformly).
Supported document types#
| Type | Notes |
|---|---|
| Most common. Tables and footnotes preserved; column flow detected. | |
| DOCX | Headings preserved as semantic boundaries. |
| Markdown | H1/H2/H3 used as boundaries. |
| HTML | Stripped of nav/script/style; <main> preferred when present. |
| Plain text | Chunked by paragraph and sentence. |
| JSON / YAML | Treated as plain text — structured retrieval is not yet supported. |
For images of scanned PDFs (no text layer), OCR is performed automatically — quality varies with scan quality.
Limits#
- Single-document max size: 50 MB.
- Per-bot managed-knowledge cap: 5,000 documents (raise the limit through your account team for larger corpora).
- Per-chunk retrieval: maximum 50 chunks at chat time (you should aim much lower).
- Indexing timeout: 10 minutes per document. Larger documents are accepted but may need retry.