Architecture
ScaiDial sits between three external surfaces: your carrier (SIP), your tenant users (HTTP / WebRTC), and your bots (ScaiBot). The piece that makes it work is livekit-sip, a sidecar that bridges SIP audio into LiveKit rooms. Everything ScaiDial does, end-user-facing, is mediated by a LiveKit room that may have a human, a bot, and a SIP participant in it.
The components#
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
- ScaiGrid controller. The FastAPI process you talk to. Owns the trunk/extension/dialplan tables, the call/leg tables, the admin UI, and the originate path.
- livekit-sip. A standalone Go binary that runs as a sidecar. It REGISTERs to your carrier, accepts inbound INVITEs, transports SIP audio over RTP, and hands the media to LiveKit as a participant in a room.
- LiveKit server. The SFU. Audio (and any data) flows here. Browsers and bots join rooms over WebRTC; the SIP side joins via livekit-sip.
- Carrier. Your ITSP, PBX, or Asterisk. ScaiDial is a SIP client to it (REGISTER mode), so the carrier doesn't need any special inbound rules — the NAT binding the REGISTER establishes is what return traffic uses.
Inbound call: from INVITE to ringing#
- The carrier delivers a call to the DID. Because we REGISTERed earlier, the INVITE arrives at livekit-sip's NAT binding.
- livekit-sip evaluates its dispatch rules and POSTs to ScaiDial's
/sip/inboundroute with the call metadata (from, to, trunk). - The inbound dispatcher loads the Did row, follows it to a Dialplan, and walks the rules in priority order. The first matching rule decides the action.
- The dialplan engine creates the LiveKit room name, persists a
Callrow and a SIPCallLeg, then returns the room name + dispatch decision to livekit-sip. - livekit-sip joins the SIP participant to that room.
- Depending on the rule's action:
ring_extension type=wave— a user's browser softphone picks up the ring and joins the same room as a WebRTC participant. Audio bridges.ring_extension type=bot— ScaiDial asks ScaiBot to kick off its voice worker into the room (VoiceCallService.kickoff_into_room). The worker mints its own LiveKit token, joins, and starts the conversation.voicemail— the answerer plays the greeting, records the caller, persists aVoicemailMessagerow, and (if the tenant opted in) runs ScaiEcho over the audio for a transcript.forward— a new SIP REFER is sent to relay the call elsewhere.hangup— the room is torn down.
Outbound call: click-to-call#
- The user clicks "Place call" on
/my/dial. The browser POSTs/me/click-to-callwith the destination and the source extension. - The controller verifies grant ownership, picks the tenant's oldest synced trunk, generates a room name + caller identity, persists a
Callrow plus two legs (caller + SIP), and mints a LiveKit JWT for the user. - The browser connects to LiveKit with that token, publishes the mic.
- The controller, in parallel, calls livekit-sip with the trunk ID and destination — fire-and-forget so the browser is already in the room when the carrier rings.
- When the far end answers, livekit-sip joins the SIP participant to the same room. Audio bridges.
The originate is wait_until_answered=false so the user hears ringback via WebRTC instead of staring at a blank dialog. If the carrier rejects (5xx) the controller marks the SIP leg ended with the failure reason and the floating call card surfaces it as an error.
State boundaries#
| Lives in | What |
|---|---|
| MariaDB (ScaiGrid) | trunks, dids, extensions, dialplans+rules, calls+legs, voicemail messages, forward rules, tenant policy |
| livekit-sip Redis | active SIP sessions, REGISTER state |
| LiveKit server | active rooms + participants, real-time audio media |
| ScaiBunker S3 | voicemail audio recordings |
The controller is the source of truth for configuration and call records. livekit-sip and LiveKit own the live media path; their state is intentionally ephemeral. If the controller restarts mid-call, the call keeps going — but a new active-calls fetch only reconstitutes from DB.
Why livekit-sip?#
We considered three alternatives:
- Asterisk / FreeSWITCH directly. Both are excellent PBX engines but their config surface is
.conffiles and AMI, neither of which fits a multi-tenant HTTP API. We would have ended up writing a config-rendering layer around them, which is what livekit-sip already does. - A SIP library inside the controller. Routing SIP audio through Python is a bad idea for latency and operational reasons. RTP wants C, kernel sockets, and tight loops — none of which Python is good at.
- Plain LiveKit with a custom SIP gateway. This is what livekit-sip already is — they wrote it and it's good. We took the dependency.
What's deliberately separate from ScaiBot#
ScaiBot owns the conversation: prompt, voice cloning, the LiveKit worker. ScaiDial owns the line: routing, the SIP leg, call records.
Both share the LiveKit cluster and the same token-mint pattern. They don't share data models — ScaiBot's Bot row knows nothing about ScaiDial's Extension; the extension's target_ref carries the bot ID and that's the only coupling.
This keeps either product replaceable. You can use ScaiBot voice without ScaiDial (we have a web widget), and you can use ScaiDial without ScaiBot (route everything to wave users or voicemail).