Quickstart

In five minutes you'll have a transcript of a recorded file and a live captioning session running over a WebSocket.

You need:

A ScaiGrid API key with scaiecho:transcribe (any tenant admin has this).
A short audio file (.wav, .mp3, .flac, .ogg, or .m4a) — under 5 MiB to keep this synchronous.
A terminal that can speak WebSocket if you want to follow step 4 (websocat works, so does Python's websockets library).

bash
export SCAIGRID_HOST="https://scaigrid.scailabs.ai"
export SCAIGRID_API_KEY="sgk_..."

1. Check your tenant policy#

Tenant policy decides whether your transcription runs on a self-hosted STT node (Backend A) or a managed STT relay (Backend B). Most tenants are on AB allowed with B default.

bash
curl "$SCAIGRID_HOST/v1/modules/scaiecho/tenant-policy" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"

If you see a 403, you need scaiecho:admin to read policy. You can still transcribe — the policy applies automatically.

2. Transcribe a file#

bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaiecho/transcribe" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "file=@meeting.wav" \
  -F "language_hint=en" \
  -F "backend_preference=any"

python
import httpx, os

with open("meeting.wav", "rb") as f:
    resp = httpx.post(
        f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaiecho/transcribe",
        headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
        files={"file": ("meeting.wav", f, "audio/wav")},
        data={"language_hint": "en", "backend_preference": "any"},
        timeout=120.0,
    )
print(resp.json()["data"]["transcript"])

javascript
import fs from "node:fs";

const form = new FormData();
form.append("file", new Blob([fs.readFileSync("meeting.wav")]), "meeting.wav");
form.append("language_hint", "en");
form.append("backend_preference", "any");

const res = await fetch(`${process.env.SCAIGRID_HOST}/v1/modules/scaiecho/transcribe`, {
  method: "POST",
  headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` },
  body: form,
});
const { data } = await res.json();
console.log(data.transcript);

For audio under the inline threshold (default 5 MiB) you get the transcript back in the same response. Larger files return 202 Accepted with a job_id.

3. Poll an async job (if you went over the threshold)#

bash
curl "$SCAIGRID_HOST/v1/modules/scaiecho/transcribe/jobs/$JOB_ID" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"

Status moves through queued → running → completed. The transcript is on the response when status is completed. No S3 fetch needed — transcripts are text.

4. Stream live audio over WebSocket#

Open a WebSocket to /v1/modules/scaiecho/stream/transcribe, send an open JSON control frame, push binary audio chunks, and receive delta JSON frames.

python
import asyncio, json, os, websockets

async def main():
    url = (
        os.environ["SCAIGRID_HOST"].replace("https", "wss")
        + "/v1/modules/scaiecho/stream/transcribe"
        + f"?token={os.environ['SCAIGRID_API_KEY']}"
    )
    async with websockets.connect(url) as ws:
        await ws.send(json.dumps({
            "type": "open",
            "language_hint": "en",
            "media_type": "audio/wav",
            "chunk_seconds": 5.0,
        }))
        print(await ws.recv())  # {"type": "ready", ...}

        with open("meeting.wav", "rb") as f:
            while chunk := f.read(16000):
                await ws.send(chunk)

        await ws.send(json.dumps({"type": "close"}))
        async for msg in ws:
            print(msg)

asyncio.run(main())

javascript
import WebSocket from "ws";
import fs from "node:fs";

const url = `${process.env.SCAIGRID_HOST.replace("https", "wss")}`
  + `/v1/modules/scaiecho/stream/transcribe?token=${process.env.SCAIGRID_API_KEY}`;
const ws = new WebSocket(url);

ws.on("open", () => {
  ws.send(JSON.stringify({ type: "open", language_hint: "en", media_type: "audio/wav" }));
});
ws.on("message", (data) => console.log(data.toString()));

ws.on("open", () => {
  const stream = fs.createReadStream("meeting.wav", { highWaterMark: 16000 });
  stream.on("data", (chunk) => ws.send(chunk));
  stream.on("end", () => ws.send(JSON.stringify({ type: "close" })));
});

bash
# websocat — push raw bytes after an open control frame
{
  echo '{"type":"open","language_hint":"en","media_type":"audio/wav"}'
  cat meeting.wav
  echo '{"type":"close"}'
} | websocat -b "$SCAIGRID_HOST_WS/v1/modules/scaiecho/stream/transcribe?token=$SCAIGRID_API_KEY"

Server frames you'll see: ready (with the selected backend), repeated delta (with text, is_final, start, end), and finally closed.

5. (Optional) enroll a speaker for diarization#

Skip this unless you need speaker-attributed transcripts. Enrollment is biometric — it requires consent capture and the scaiecho:enroll permission, separate from scaiecho:transcribe.

bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaiecho/speakers" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "display_name=Alice" \
  -F "language_primary=en" \
  -F "consent_user_full_name=Alice Example" \
  -F "consent_stated_purpose=Meeting transcription diarization" \
  -F "consent_text=I consent to enrollment for diarization." \
  -F "reference=@alice-reference.wav" \
  -F "consent=@alice-consent.wav"

See Enroll a speaker for diarization for the full pipeline, including how diarized streaming requests pick up enrolled profiles.

What just happened#

Step 2 ran through TranscribeService, which consulted your tenant policy and picked Backend A or B. Short audio went sync; long audio enqueued a job on the arq worker pool.
Step 4 opened a StreamTranscribeService session. Audio frames went to the dispatcher's chunk-relay path; transcript records came back over the WebSocket as JSON deltas.
Every call was metered by ScaiGrid's accounting pipeline against your tenant's budget, just like a chat completion.

Next#

Read Architecture to understand the backend split and the dispatcher contract.
Read Streaming transports to choose between WebSocket and WebRTC for live audio.
See the API reference for every endpoint and error code.