Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Architecture

ScaiDial sits between three external surfaces: your carrier (SIP), your tenant users (HTTP / WebRTC), and your bots (ScaiBot). The piece that makes it work is livekit-sip, a sidecar that bridges SIP audio into LiveKit rooms. Everything ScaiDial does, end-user-facing, is mediated by a LiveKit room that may have a human, a bot, and a SIP participant in it.

The components#

text
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
                         ┌────────────────────────────────┐
                         │       ScaiGrid controller       │
                         │  (modules/scaidial)             │
                         │  ┌──────────────┐  ┌──────────┐ │
   carrier ──── SIP ───► │  │ inbound      │  │ outbound │ │
                         │  │ dispatcher   │  │ origin   │ │
                         │  └──────┬───────┘  └────┬─────┘ │
                         │         │               │       │
                         │         ▼               ▼       │
                         │     dialplan        livekit-sip │
                         │     engine          client      │
                         └────────┬──────────────┬─────────┘
                                  │              │
                                  │              │ gRPC
                                  ▼              ▼
                         ┌──────────────────────────────┐
                         │       livekit-sip            │
                         │  (sidecar, ports 5060/UDP   │
                         │   + 10000-10200 RTP UDP)    │
                         └──────┬───────────────────────┘
                                │ SIP/RTP
                                ▼
                         ┌──────────────┐
                         │  LiveKit     │
                         │  server      │◄── WebRTC ── browser softphone
                         │              │◄── WebRTC ── ScaiBot voice worker
                         └──────────────┘
  • ScaiGrid controller. The FastAPI process you talk to. Owns the trunk/extension/dialplan tables, the call/leg tables, the admin UI, and the originate path.
  • livekit-sip. A standalone Go binary that runs as a sidecar. It REGISTERs to your carrier, accepts inbound INVITEs, transports SIP audio over RTP, and hands the media to LiveKit as a participant in a room.
  • LiveKit server. The SFU. Audio (and any data) flows here. Browsers and bots join rooms over WebRTC; the SIP side joins via livekit-sip.
  • Carrier. Your ITSP, PBX, or Asterisk. ScaiDial is a SIP client to it (REGISTER mode), so the carrier doesn't need any special inbound rules — the NAT binding the REGISTER establishes is what return traffic uses.

Inbound call: from INVITE to ringing#

  1. The carrier delivers a call to the DID. Because we REGISTERed earlier, the INVITE arrives at livekit-sip's NAT binding.
  2. livekit-sip evaluates its dispatch rules and POSTs to ScaiDial's /sip/inbound route with the call metadata (from, to, trunk).
  3. The inbound dispatcher loads the Did row, follows it to a Dialplan, and walks the rules in priority order. The first matching rule decides the action.
  4. The dialplan engine creates the LiveKit room name, persists a Call row and a SIP CallLeg, then returns the room name + dispatch decision to livekit-sip.
  5. livekit-sip joins the SIP participant to that room.
  6. Depending on the rule's action:
    • ring_extension type=wave — a user's browser softphone picks up the ring and joins the same room as a WebRTC participant. Audio bridges.
    • ring_extension type=bot — ScaiDial asks ScaiBot to kick off its voice worker into the room (VoiceCallService.kickoff_into_room). The worker mints its own LiveKit token, joins, and starts the conversation.
    • voicemail — the answerer plays the greeting, records the caller, persists a VoicemailMessage row, and (if the tenant opted in) runs ScaiEcho over the audio for a transcript.
    • forward — a new SIP REFER is sent to relay the call elsewhere.
    • hangup — the room is torn down.

Outbound call: click-to-call#

  1. The user clicks "Place call" on /my/dial. The browser POSTs /me/click-to-call with the destination and the source extension.
  2. The controller verifies grant ownership, picks the tenant's oldest synced trunk, generates a room name + caller identity, persists a Call row plus two legs (caller + SIP), and mints a LiveKit JWT for the user.
  3. The browser connects to LiveKit with that token, publishes the mic.
  4. The controller, in parallel, calls livekit-sip with the trunk ID and destination — fire-and-forget so the browser is already in the room when the carrier rings.
  5. When the far end answers, livekit-sip joins the SIP participant to the same room. Audio bridges.

The originate is wait_until_answered=false so the user hears ringback via WebRTC instead of staring at a blank dialog. If the carrier rejects (5xx) the controller marks the SIP leg ended with the failure reason and the floating call card surfaces it as an error.

State boundaries#

Lives in What
MariaDB (ScaiGrid) trunks, dids, extensions, dialplans+rules, calls+legs, voicemail messages, forward rules, tenant policy
livekit-sip Redis active SIP sessions, REGISTER state
LiveKit server active rooms + participants, real-time audio media
ScaiBunker S3 voicemail audio recordings

The controller is the source of truth for configuration and call records. livekit-sip and LiveKit own the live media path; their state is intentionally ephemeral. If the controller restarts mid-call, the call keeps going — but a new active-calls fetch only reconstitutes from DB.

Why livekit-sip?#

We considered three alternatives:

  • Asterisk / FreeSWITCH directly. Both are excellent PBX engines but their config surface is .conf files and AMI, neither of which fits a multi-tenant HTTP API. We would have ended up writing a config-rendering layer around them, which is what livekit-sip already does.
  • A SIP library inside the controller. Routing SIP audio through Python is a bad idea for latency and operational reasons. RTP wants C, kernel sockets, and tight loops — none of which Python is good at.
  • Plain LiveKit with a custom SIP gateway. This is what livekit-sip already is — they wrote it and it's good. We took the dependency.

What's deliberately separate from ScaiBot#

ScaiBot owns the conversation: prompt, voice cloning, the LiveKit worker. ScaiDial owns the line: routing, the SIP leg, call records.

Both share the LiveKit cluster and the same token-mint pattern. They don't share data models — ScaiBot's Bot row knows nothing about ScaiDial's Extension; the extension's target_ref carries the bot ID and that's the only coupling.

This keeps either product replaceable. You can use ScaiBot voice without ScaiDial (we have a web widget), and you can use ScaiDial without ScaiBot (route everything to wave users or voicemail).

Updated 2026-06-23 01:06:32 View source (.md) rev 1