Enroll a speaker for diarization

Speaker diarization in ScaiEcho works against an enrolled speaker library. You upload a short reference recording, a separate consent recording, and metadata; the platform runs a quality preflight, persists the consent record, then fans the enrollment out to every online pyannote node. Once any node returns success, the profile flips to ready and the diarize path can label segments from that speaker.

This tutorial walks the full enrollment pipeline, then shows how to consume it from a streaming transcribe call.

Permissions#

Speaker enrollment is biometric capture — scaiecho:enroll is a separate permission from scaiecho:transcribe. A tenant admin granting "transcribe access" doesn't implicitly grant enrollment. Diarized transcription requires scaiecho:diarize on top of scaiecho:transcribe.

Two separate recordings:

Reference — at least a few seconds of clean speech from the speaker, no background noise. The pyannote embedding model uses this to identify the speaker in later transcripts.
Consent — the speaker reading the consent text aloud. This is the immutable audit record proving they agreed to enrollment. Keep it short and self-contained.

Both can be any common audio format the dispatcher accepts (wav, mp3, flac, ogg, m4a).

2. Submit the enrollment#

bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaiecho/speakers" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "display_name=Alice Example" \
  -F "language_primary=en" \
  -F "description=Customer success lead" \
  -F "consent_user_full_name=Alice Example" \
  -F "consent_stated_purpose=Meeting transcription diarization for the support team" \
  -F "consent_text=I, Alice Example, consent to ScaiEcho enrolling my voice for diarization." \
  -F "reference=@alice-reference.wav" \
  -F "consent=@alice-consent.wav"

python
import httpx, os

with open("alice-reference.wav", "rb") as ref, open("alice-consent.wav", "rb") as cons:
    resp = httpx.post(
        f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaiecho/speakers",
        headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
        files={
            "reference": ("alice-reference.wav", ref, "audio/wav"),
            "consent": ("alice-consent.wav", cons, "audio/wav"),
        },
        data={
            "display_name": "Alice Example",
            "language_primary": "en",
            "consent_user_full_name": "Alice Example",
            "consent_stated_purpose": "Meeting transcription diarization",
            "consent_text": "I consent to enrollment for diarization.",
        },
        timeout=120.0,
    )
print(resp.json()["data"])

javascript
import fs from "node:fs";

const form = new FormData();
form.append("reference", new Blob([fs.readFileSync("alice-reference.wav")]), "alice-reference.wav");
form.append("consent", new Blob([fs.readFileSync("alice-consent.wav")]), "alice-consent.wav");
form.append("display_name", "Alice Example");
form.append("language_primary", "en");
form.append("consent_user_full_name", "Alice Example");
form.append("consent_stated_purpose", "Meeting transcription diarization");
form.append("consent_text", "I consent to enrollment for diarization.");

const res = await fetch(`${process.env.SCAIGRID_HOST}/v1/modules/scaiecho/speakers`, {
  method: "POST",
  headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` },
  body: form,
});
console.log(await res.json());

The response is 201 Created with the speaker profile plus a preflight block describing what the quality check found, an enrolled_on array listing the pyannote node ids that accepted the enrollment, and enrollment_status (one of pending, ready, failed, evicted).

If the response includes a note saying no pyannote node is online, the row stays at pending until at least one such node lands — the profile is recorded but cannot yet diarize. The diarization path won't be able to label segments until enrollment fans out.

3. Inspect and re-warm enrollment#

To see which nodes currently hold the speaker enrolled:

bash
curl "$SCAIGRID_HOST/v1/modules/scaiecho/speakers/$SPEAKER_ID/warm" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"

You'll get three sets: warm_node_ids (Redis-registered), candidate_node_ids (nodes currently running the pyannote engine), stale_node_ids (in the registry but not a candidate — typically a node that left the cluster).

To proactively re-enroll on every candidate (or a specific subset):

bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaiecho/speakers/$SPEAKER_ID/warm" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"node_ids": []}'

An empty node_ids list means "all candidates." A list of node ids targets that subset. Useful when a node was added after the original intake fan-out, or when an earlier enrollment failed and ops wants to retry.

4. Request a diarized transcript#

Streaming routes accept diarize: true on the open frame; the dispatcher attaches a speaker_label to each delta whose segment matches an enrolled profile.

python
async with websockets.connect(url) as ws:
    await ws.send(json.dumps({
        "type": "open",
        "language_hint": "en",
        "media_type": "audio/wav",
        "diarize": True,
    }))
    print(await ws.recv())  # {"type": "ready", "backend_used": "A"}
    # ... push audio ...
    async for msg in ws:
        d = json.loads(msg)
        if d.get("type") == "delta":
            print(d.get("speaker_label", "?"), d["text"])

Two things to know about the speaker_label field:

It is omitted when no label is available — either diarization wasn't requested, the dispatcher couldn't run it (no pyannote node, Backend B selected), or this segment didn't match an enrolled profile cleanly.
Segments matching no enrolled profile get a per-session cluster id like spk_0, spk_1. The mapping isn't stable across sessions.

Diarization on Backend B is silently ignored — the managed STT relay does not expose diarization. If your tenant policy pinned the stream to B, the deltas will not carry speaker labels regardless of diarize=true.

5. Delete a speaker#

bash
curl -X DELETE "$SCAIGRID_HOST/v1/modules/scaiecho/speakers/$SPEAKER_ID" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"

This is the GDPR Art. 17 fan-out. The orchestrator deletes the reference and consent blobs from S3, writes an immutable ErasureAudit row, tombstones the speaker row, and evicts the embedding from every pyannote node that held it. The response includes audit_id (the immutable record), blob_bytes_deleted, and per-node error_summary if any node failed to evict.

After deletion, future diarized streams won't label that speaker; existing transcripts that already attributed segments to the speaker keep the labels they had at transcription time (transcripts are not retroactively edited).