Architecture
ScaiBot is a thin product layer over ScaiGrid's existing primitives — inference, sessions, accounting, and the ScaiMatrix knowledge engine. There is no separate "bot engine"; a bot is a configuration that orchestrates these primitives.
Components#
There's no separate ScaiBot deployment. ScaiBot is a ScaiGrid module — it runs in the same FastAPI process, behind the same auth, accounted against the same budgets.
Request flow for one chat turn#
- Widget sends
POST /v1/modules/scaibot/chatwith the bot id, conversation id (or a fresh one on the first turn), and the visitor's message. - Auth validates the embed token, resolves it to the bot and visitor identity.
- Bot config + tone are loaded; together they produce the system prompt.
- Knowledge retrieval. If
knowledge_enabled, the user's message is embedded and matched against the bot's knowledge collection (managed or linked). Top-K chunks are gathered. - Escalation pre-check. Keyword and explicit-request rules fire here (before tokens are spent) — if matched, the bot returns the escalation message instead of generating.
- Inference. ScaiGrid's chat completion endpoint is called with the system prompt + retrieved chunks + recent conversation history. The response is streamed back to the widget over SSE.
- Escalation post-check. Intent, sentiment, and confidence rules run on the generated response — if matched, the action fires (email/webhook/Slack/queue).
- Accounting. Tokens, latency, retrieval count, and escalation status are recorded.
- Conversation store. The full turn (user message, retrieved chunks, generated response, escalation outcome) is persisted.
State#
- Bots, tones, escalation rules, knowledge collections — in ScaiGrid's MariaDB.
- Knowledge chunks + embeddings — in ScaiMatrix (Weaviate underneath, but you talk to ScaiMatrix's API).
- Conversations + messages — partitioned tables in MariaDB; pruned by tenant retention policy.
- Embed tokens — short-lived, signed; not persisted longer than necessary.
- No client-side state matters — the widget only holds a conversation id in a cookie. Losing it just starts a new conversation.
How it differs from calling inference directly#
A direct ScaiGrid chat-completion call gives you tokens-out. ScaiBot adds:
| Concern | Direct call | ScaiBot |
|---|---|---|
| System prompt | You write it | Generated from tone config |
| Knowledge retrieval | You orchestrate it | Built-in; toggled with a boolean |
| Conversation persistence | You build it | Built-in |
| Escalation | You wire it | Rule-based, built-in |
| Embeddable widget | You build it | Single <script> tag |
| Analytics | You instrument it | Built-in dashboards |
For a one-shot completion or a custom UI, call inference directly. For a chatbot product, use ScaiBot — the savings are most of the surface area.
Where the trust boundary is#
The embed token authenticates the bot, not the visitor. The widget runs in the visitor's browser and is fully observable; do not embed admin credentials in it. Visitor identity (if any) is optional and is sent via data-user-id / data-user-email attributes, with the script tag's token as the only authority. For authenticated-visitor flows where you trust the visitor's identity, generate tokens server-side per visitor with their identity baked in.