---
summary: "How a call moves through ScaiDial \u2014 from the carrier's INVITE to a\
  \ ringing extension and back. Why livekit-sip sits in the middle and what state\
  \ lives where."
title: Architecture
path: concepts/architecture
status: published
---

ScaiDial sits between three external surfaces: your carrier (SIP), your tenant users (HTTP / WebRTC), and your bots (ScaiBot). The piece that makes it work is **livekit-sip**, a sidecar that bridges SIP audio into LiveKit rooms. Everything ScaiDial does, end-user-facing, is mediated by a LiveKit room that may have a human, a bot, and a SIP participant in it.

## The components

```
                         ┌────────────────────────────────┐
                         │       ScaiGrid controller       │
                         │  (modules/scaidial)             │
                         │  ┌──────────────┐  ┌──────────┐ │
   carrier ──── SIP ───► │  │ inbound      │  │ outbound │ │
                         │  │ dispatcher   │  │ origin   │ │
                         │  └──────┬───────┘  └────┬─────┘ │
                         │         │               │       │
                         │         ▼               ▼       │
                         │     dialplan        livekit-sip │
                         │     engine          client      │
                         └────────┬──────────────┬─────────┘
                                  │              │
                                  │              │ gRPC
                                  ▼              ▼
                         ┌──────────────────────────────┐
                         │       livekit-sip            │
                         │  (sidecar, ports 5060/UDP   │
                         │   + 10000-10200 RTP UDP)    │
                         └──────┬───────────────────────┘
                                │ SIP/RTP
                                ▼
                         ┌──────────────┐
                         │  LiveKit     │
                         │  server      │◄── WebRTC ── browser softphone
                         │              │◄── WebRTC ── ScaiBot voice worker
                         └──────────────┘
```

- **ScaiGrid controller.** The FastAPI process you talk to. Owns the trunk/extension/dialplan tables, the call/leg tables, the admin UI, and the originate path.
- **livekit-sip.** A standalone Go binary that runs as a sidecar. It REGISTERs to your carrier, accepts inbound INVITEs, transports SIP audio over RTP, and hands the media to LiveKit as a participant in a room.
- **LiveKit server.** The SFU. Audio (and any data) flows here. Browsers and bots join rooms over WebRTC; the SIP side joins via livekit-sip.
- **Carrier.** Your ITSP, PBX, or Asterisk. ScaiDial is a SIP client to it (REGISTER mode), so the carrier doesn't need any special inbound rules — the NAT binding the REGISTER establishes is what return traffic uses.

## Inbound call: from INVITE to ringing

1. The carrier delivers a call to the DID. Because we REGISTERed earlier, the INVITE arrives at livekit-sip's NAT binding.
2. livekit-sip evaluates its dispatch rules and POSTs to ScaiDial's `/sip/inbound` route with the call metadata (from, to, trunk).
3. The inbound dispatcher loads the Did row, follows it to a Dialplan, and walks the rules in priority order. The first matching rule decides the action.
4. The dialplan engine creates the LiveKit room name, persists a `Call` row and a SIP `CallLeg`, then returns the room name + dispatch decision to livekit-sip.
5. livekit-sip joins the SIP participant to that room.
6. Depending on the rule's action:
   - `ring_extension type=wave` — a user's browser softphone picks up the ring and joins the same room as a WebRTC participant. Audio bridges.
   - `ring_extension type=bot` — ScaiDial asks ScaiBot to kick off its voice worker into the room (`VoiceCallService.kickoff_into_room`). The worker mints its own LiveKit token, joins, and starts the conversation.
   - `voicemail` — the answerer plays the greeting, records the caller, persists a `VoicemailMessage` row, and (if the tenant opted in) runs ScaiEcho over the audio for a transcript.
   - `forward` — a new SIP REFER is sent to relay the call elsewhere.
   - `hangup` — the room is torn down.

## Outbound call: click-to-call

1. The user clicks "Place call" on `/my/dial`. The browser POSTs `/me/click-to-call` with the destination and the source extension.
2. The controller verifies grant ownership, picks the tenant's oldest synced trunk, generates a room name + caller identity, persists a `Call` row plus two legs (caller + SIP), and mints a LiveKit JWT for the user.
3. The browser connects to LiveKit with that token, publishes the mic.
4. The controller, in parallel, calls livekit-sip with the trunk ID and destination — fire-and-forget so the browser is already in the room when the carrier rings.
5. When the far end answers, livekit-sip joins the SIP participant to the same room. Audio bridges.

The originate is `wait_until_answered=false` so the user hears ringback via WebRTC instead of staring at a blank dialog. If the carrier rejects (5xx) the controller marks the SIP leg `ended` with the failure reason and the floating call card surfaces it as an error.

## State boundaries

| Lives in | What |
|---|---|
| MariaDB (ScaiGrid) | trunks, dids, extensions, dialplans+rules, calls+legs, voicemail messages, forward rules, tenant policy |
| livekit-sip Redis | active SIP sessions, REGISTER state |
| LiveKit server | active rooms + participants, real-time audio media |
| ScaiBunker S3 | voicemail audio recordings |

The controller is the source of truth for configuration and call records. livekit-sip and LiveKit own the live media path; their state is intentionally ephemeral. If the controller restarts mid-call, the call keeps going — but a new active-calls fetch only reconstitutes from DB.

## Why livekit-sip?

We considered three alternatives:

- **Asterisk / FreeSWITCH directly.** Both are excellent PBX engines but their config surface is `.conf` files and AMI, neither of which fits a multi-tenant HTTP API. We would have ended up writing a config-rendering layer around them, which is what livekit-sip already does.
- **A SIP library inside the controller.** Routing SIP audio through Python is a bad idea for latency and operational reasons. RTP wants C, kernel sockets, and tight loops — none of which Python is good at.
- **Plain LiveKit with a custom SIP gateway.** This is what livekit-sip already is — they wrote it and it's good. We took the dependency.

## What's deliberately separate from ScaiBot

ScaiBot owns the conversation: prompt, voice cloning, the LiveKit worker. ScaiDial owns the line: routing, the SIP leg, call records.

Both share the LiveKit cluster and the same token-mint pattern. They don't share data models — ScaiBot's `Bot` row knows nothing about ScaiDial's `Extension`; the extension's `target_ref` carries the bot ID and that's the only coupling.

This keeps either product replaceable. You can use ScaiBot voice without ScaiDial (we have a web widget), and you can use ScaiDial without ScaiBot (route everything to wave users or voicemail).
