Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Quickstart

In five minutes you'll have a transcript of a recorded file and a live captioning session running over a WebSocket.

You need:

  • A ScaiGrid API key with scaiecho:transcribe (any tenant admin has this).
  • A short audio file (.wav, .mp3, .flac, .ogg, or .m4a) — under 5 MiB to keep this synchronous.
  • A terminal that can speak WebSocket if you want to follow step 4 (websocat works, so does Python's websockets library).
bash
1
2
export SCAIGRID_HOST="https://scaigrid.scailabs.ai"
export SCAIGRID_API_KEY="sgk_..."

1. Check your tenant policy#

Tenant policy decides whether your transcription runs on a self-hosted STT node (Backend A) or a managed STT relay (Backend B). Most tenants are on AB allowed with B default.

bash
1
2
curl "$SCAIGRID_HOST/v1/modules/scaiecho/tenant-policy" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"

If you see a 403, you need scaiecho:admin to read policy. You can still transcribe — the policy applies automatically.

2. Transcribe a file#

bash
1
2
3
4
5
curl -X POST "$SCAIGRID_HOST/v1/modules/scaiecho/transcribe" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "file=@meeting.wav" \
  -F "language_hint=en" \
  -F "backend_preference=any"
python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import httpx, os

with open("meeting.wav", "rb") as f:
    resp = httpx.post(
        f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaiecho/transcribe",
        headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
        files={"file": ("meeting.wav", f, "audio/wav")},
        data={"language_hint": "en", "backend_preference": "any"},
        timeout=120.0,
    )
print(resp.json()["data"]["transcript"])
javascript
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import fs from "node:fs";

const form = new FormData();
form.append("file", new Blob([fs.readFileSync("meeting.wav")]), "meeting.wav");
form.append("language_hint", "en");
form.append("backend_preference", "any");

const res = await fetch(`${process.env.SCAIGRID_HOST}/v1/modules/scaiecho/transcribe`, {
  method: "POST",
  headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` },
  body: form,
});
const { data } = await res.json();
console.log(data.transcript);

For audio under the inline threshold (default 5 MiB) you get the transcript back in the same response. Larger files return 202 Accepted with a job_id.

3. Poll an async job (if you went over the threshold)#

bash
1
2
curl "$SCAIGRID_HOST/v1/modules/scaiecho/transcribe/jobs/$JOB_ID" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"

Status moves through queuedrunningcompleted. The transcript is on the response when status is completed. No S3 fetch needed — transcripts are text.

4. Stream live audio over WebSocket#

Open a WebSocket to /v1/modules/scaiecho/stream/transcribe, send an open JSON control frame, push binary audio chunks, and receive delta JSON frames.

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import asyncio, json, os, websockets

async def main():
    url = (
        os.environ["SCAIGRID_HOST"].replace("https", "wss")
        + "/v1/modules/scaiecho/stream/transcribe"
        + f"?token={os.environ['SCAIGRID_API_KEY']}"
    )
    async with websockets.connect(url) as ws:
        await ws.send(json.dumps({
            "type": "open",
            "language_hint": "en",
            "media_type": "audio/wav",
            "chunk_seconds": 5.0,
        }))
        print(await ws.recv())  # {"type": "ready", ...}

        with open("meeting.wav", "rb") as f:
            while chunk := f.read(16000):
                await ws.send(chunk)

        await ws.send(json.dumps({"type": "close"}))
        async for msg in ws:
            print(msg)

asyncio.run(main())
javascript
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import WebSocket from "ws";
import fs from "node:fs";

const url = `${process.env.SCAIGRID_HOST.replace("https", "wss")}`
  + `/v1/modules/scaiecho/stream/transcribe?token=${process.env.SCAIGRID_API_KEY}`;
const ws = new WebSocket(url);

ws.on("open", () => {
  ws.send(JSON.stringify({ type: "open", language_hint: "en", media_type: "audio/wav" }));
});
ws.on("message", (data) => console.log(data.toString()));

ws.on("open", () => {
  const stream = fs.createReadStream("meeting.wav", { highWaterMark: 16000 });
  stream.on("data", (chunk) => ws.send(chunk));
  stream.on("end", () => ws.send(JSON.stringify({ type: "close" })));
});
bash
1
2
3
4
5
6
# websocat — push raw bytes after an open control frame
{
  echo '{"type":"open","language_hint":"en","media_type":"audio/wav"}'
  cat meeting.wav
  echo '{"type":"close"}'
} | websocat -b "$SCAIGRID_HOST_WS/v1/modules/scaiecho/stream/transcribe?token=$SCAIGRID_API_KEY"

Server frames you'll see: ready (with the selected backend), repeated delta (with text, is_final, start, end), and finally closed.

5. (Optional) enroll a speaker for diarization#

Skip this unless you need speaker-attributed transcripts. Enrollment is biometric — it requires consent capture and the scaiecho:enroll permission, separate from scaiecho:transcribe.

bash
1
2
3
4
5
6
7
8
9
curl -X POST "$SCAIGRID_HOST/v1/modules/scaiecho/speakers" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "display_name=Alice" \
  -F "language_primary=en" \
  -F "consent_user_full_name=Alice Example" \
  -F "consent_stated_purpose=Meeting transcription diarization" \
  -F "consent_text=I consent to enrollment for diarization." \
  -F "reference=@alice-reference.wav" \
  -F "consent=@alice-consent.wav"

See Enroll a speaker for diarization for the full pipeline, including how diarized streaming requests pick up enrolled profiles.

What just happened#

  • Step 2 ran through TranscribeService, which consulted your tenant policy and picked Backend A or B. Short audio went sync; long audio enqueued a job on the arq worker pool.
  • Step 4 opened a StreamTranscribeService session. Audio frames went to the dispatcher's chunk-relay path; transcript records came back over the WebSocket as JSON deltas.
  • Every call was metered by ScaiGrid's accounting pipeline against your tenant's budget, just like a chat completion.

Next#

Updated 2026-05-18 15:01:27 View source (.md) rev 12