--- summary: How the widget, the ScaiBot API, the inference layer, and the knowledge index fit together. title: Architecture path: concepts/architecture status: published --- # Architecture ScaiBot is a thin product layer over ScaiGrid's existing primitives — inference, sessions, accounting, and the ScaiMatrix knowledge engine. There is no separate "bot engine"; a bot is a configuration that orchestrates these primitives. ## Components ```mermaid flowchart LR V[Visitor] YS["Your site
<script>..."] BC["Bot config
Tone
Knowledge
Conv. log
Escalation"] INF["ScaiGrid inference
+ ScaiMatrix retrieval"] V -- HTTP page --> YS YS -- widget.js --> V V <-- chat messages (SSE stream) --> YS YS <-- WS/SSE --> BC BC --> INF subgraph SG ["ScaiGrid — /v1/modules/scaibot/..."] BC INF end ``` There's no separate ScaiBot deployment. ScaiBot is a ScaiGrid module — it runs in the same FastAPI process, behind the same auth, accounted against the same budgets. ## Request flow for one chat turn 1. **Widget** sends `POST /v1/modules/scaibot/chat` with the bot id, conversation id (or a fresh one on the first turn), and the visitor's message. 2. **Auth** validates the embed token, resolves it to the bot and visitor identity. 3. **Bot config + tone** are loaded; together they produce the system prompt. 4. **Knowledge retrieval.** If `knowledge_enabled`, the user's message is embedded and matched against the bot's knowledge collection (managed or linked). Top-K chunks are gathered. 5. **Escalation pre-check.** Keyword and explicit-request rules fire here (before tokens are spent) — if matched, the bot returns the escalation message instead of generating. 6. **Inference.** ScaiGrid's chat completion endpoint is called with the system prompt + retrieved chunks + recent conversation history. The response is streamed back to the widget over SSE. 7. **Escalation post-check.** Intent, sentiment, and confidence rules run on the generated response — if matched, the action fires (email/webhook/Slack/queue). 8. **Accounting.** Tokens, latency, retrieval count, and escalation status are recorded. 9. **Conversation store.** The full turn (user message, retrieved chunks, generated response, escalation outcome) is persisted. ## State - **Bots, tones, escalation rules, knowledge collections** — in ScaiGrid's MariaDB. - **Knowledge chunks + embeddings** — in ScaiMatrix (Weaviate underneath, but you talk to ScaiMatrix's API). - **Conversations + messages** — partitioned tables in MariaDB; pruned by tenant retention policy. - **Embed tokens** — short-lived, signed; not persisted longer than necessary. - **No client-side state matters** — the widget only holds a conversation id in a cookie. Losing it just starts a new conversation. ## How it differs from calling inference directly A direct ScaiGrid chat-completion call gives you tokens-out. ScaiBot adds: | Concern | Direct call | ScaiBot | |---|---|---| | System prompt | You write it | Generated from tone config | | Knowledge retrieval | You orchestrate it | Built-in; toggled with a boolean | | Conversation persistence | You build it | Built-in | | Escalation | You wire it | Rule-based, built-in | | Embeddable widget | You build it | Single `