Architecture

ScaiBot is a thin product layer over ScaiGrid's existing primitives — inference, sessions, accounting, and the ScaiMatrix knowledge engine. There is no separate "bot engine"; a bot is a configuration that orchestrates these primitives.

Components#

flowchart LR V[Visitor] YS["Your site <script>..."] BC["Bot config Tone Knowledge Conv. log Escalation"] INF["ScaiGrid inference + ScaiMatrix retrieval"] V -- HTTP page --> YS YS -- widget.js --> V V <-- chat messages (SSE stream) --> YS YS <-- WS/SSE --> BC BC --> INF subgraph SG ["ScaiGrid — /v1/modules/scaibot/..."] BC INF end

There's no separate ScaiBot deployment. ScaiBot is a ScaiGrid module — it runs in the same FastAPI process, behind the same auth, accounted against the same budgets.

Request flow for one chat turn#

Widget sends POST /v1/modules/scaibot/chat with the bot id, conversation id (or a fresh one on the first turn), and the visitor's message.
Auth validates the embed token, resolves it to the bot and visitor identity.
Bot config + tone are loaded; together they produce the system prompt.
Knowledge retrieval. If knowledge_enabled, the user's message is embedded and matched against the bot's knowledge collection (managed or linked). Top-K chunks are gathered.
Escalation pre-check. Keyword and explicit-request rules fire here (before tokens are spent) — if matched, the bot returns the escalation message instead of generating.
Inference. ScaiGrid's chat completion endpoint is called with the system prompt + retrieved chunks + recent conversation history. The response is streamed back to the widget over SSE.
Escalation post-check. Intent, sentiment, and confidence rules run on the generated response — if matched, the action fires (email/webhook/Slack/queue).
Accounting. Tokens, latency, retrieval count, and escalation status are recorded.
Conversation store. The full turn (user message, retrieved chunks, generated response, escalation outcome) is persisted.

State#

Bots, tones, escalation rules, knowledge collections — in ScaiGrid's MariaDB.
Knowledge chunks + embeddings — in ScaiMatrix (Weaviate underneath, but you talk to ScaiMatrix's API).
Conversations + messages — partitioned tables in MariaDB; pruned by tenant retention policy.
Embed tokens — short-lived, signed; not persisted longer than necessary.
No client-side state matters — the widget only holds a conversation id in a cookie. Losing it just starts a new conversation.

How it differs from calling inference directly#

A direct ScaiGrid chat-completion call gives you tokens-out. ScaiBot adds:

Concern	Direct call	ScaiBot
System prompt	You write it	Generated from tone config
Knowledge retrieval	You orchestrate it	Built-in; toggled with a boolean
Conversation persistence	You build it	Built-in
Escalation	You wire it	Rule-based, built-in
Embeddable widget	You build it	Single `<script>` tag
Analytics	You instrument it	Built-in dashboards

For a one-shot completion or a custom UI, call inference directly. For a chatbot product, use ScaiBot — the savings are most of the surface area.

Where the trust boundary is#

The embed token authenticates the bot, not the visitor. The widget runs in the visitor's browser and is fully observable; do not embed admin credentials in it. Visitor identity (if any) is optional and is sent via data-user-id / data-user-email attributes, with the script tag's token as the only authority. For authenticated-visitor flows where you trust the visitor's identity, generate tokens server-side per visitor with their identity baked in.