---
summary: How the widget, the ScaiBot API, the inference layer, and the knowledge index
  fit together.
title: Architecture
path: concepts/architecture
status: published
---

# Architecture

ScaiBot is a thin product layer over ScaiGrid's existing primitives — inference, sessions, accounting, and the ScaiMatrix knowledge engine. There is no separate "bot engine"; a bot is a configuration that orchestrates these primitives.

## Components

```mermaid
flowchart LR
    V[Visitor]
    YS["Your site<br/>&lt;script&gt;..."]
    BC["Bot config<br/>Tone<br/>Knowledge<br/>Conv. log<br/>Escalation"]
    INF["ScaiGrid inference<br/>+ ScaiMatrix retrieval"]

    V -- HTTP page --> YS
    YS -- widget.js --> V
    V <-- chat messages (SSE stream) --> YS
    YS <-- WS/SSE --> BC
    BC --> INF

    subgraph SG ["ScaiGrid &mdash; /v1/modules/scaibot/..."]
        BC
        INF
    end
```

There's no separate ScaiBot deployment. ScaiBot is a ScaiGrid module — it runs in the same FastAPI process, behind the same auth, accounted against the same budgets.

## Request flow for one chat turn

1. **Widget** sends `POST /v1/modules/scaibot/chat` with the bot id, conversation id (or a fresh one on the first turn), and the visitor's message.
2. **Auth** validates the embed token, resolves it to the bot and visitor identity.
3. **Bot config + tone** are loaded; together they produce the system prompt.
4. **Knowledge retrieval.** If `knowledge_enabled`, the user's message is embedded and matched against the bot's knowledge collection (managed or linked). Top-K chunks are gathered.
5. **Escalation pre-check.** Keyword and explicit-request rules fire here (before tokens are spent) — if matched, the bot returns the escalation message instead of generating.
6. **Inference.** ScaiGrid's chat completion endpoint is called with the system prompt + retrieved chunks + recent conversation history. The response is streamed back to the widget over SSE.
7. **Escalation post-check.** Intent, sentiment, and confidence rules run on the generated response — if matched, the action fires (email/webhook/Slack/queue).
8. **Accounting.** Tokens, latency, retrieval count, and escalation status are recorded.
9. **Conversation store.** The full turn (user message, retrieved chunks, generated response, escalation outcome) is persisted.

## State

- **Bots, tones, escalation rules, knowledge collections** — in ScaiGrid's MariaDB.
- **Knowledge chunks + embeddings** — in ScaiMatrix (Weaviate underneath, but you talk to ScaiMatrix's API).
- **Conversations + messages** — partitioned tables in MariaDB; pruned by tenant retention policy.
- **Embed tokens** — short-lived, signed; not persisted longer than necessary.
- **No client-side state matters** — the widget only holds a conversation id in a cookie. Losing it just starts a new conversation.

## How it differs from calling inference directly

A direct ScaiGrid chat-completion call gives you tokens-out. ScaiBot adds:

| Concern | Direct call | ScaiBot |
|---|---|---|
| System prompt | You write it | Generated from tone config |
| Knowledge retrieval | You orchestrate it | Built-in; toggled with a boolean |
| Conversation persistence | You build it | Built-in |
| Escalation | You wire it | Rule-based, built-in |
| Embeddable widget | You build it | Single `<script>` tag |
| Analytics | You instrument it | Built-in dashboards |

For a one-shot completion or a custom UI, call inference directly. For a chatbot product, use ScaiBot — the savings are most of the surface area.

## Where the trust boundary is

The embed token authenticates the **bot**, not the visitor. The widget runs in the visitor's browser and is fully observable; do not embed admin credentials in it. Visitor identity (if any) is optional and is sent via `data-user-id` / `data-user-email` attributes, with the script tag's token as the only authority. For authenticated-visitor flows where you trust the visitor's identity, generate tokens server-side per visitor with their identity baked in.
