Transcribe long audio with async jobs
The POST /transcribe endpoint runs short audio inline and long audio asynchronously. This tutorial walks through the async path end-to-end — what triggers it, what 202 looks like, how to poll, and how to recover from failures.
When does async kick in#
Two conditions either of which triggers the async path:
- The uploaded file is larger than
scaiecho_async_audio_threshold_bytes. The platform default is 5 MiB — roughly five minutes of 16 kHz mono PCM. Operators can change it per deployment. - You set
force_async=trueon the multipart form. Use this when you know the audio will exceed the inline budget despite being under the byte threshold — for example, a heavily compressed recording with a long real-time duration.
Anything that doesn't trip either condition returns the transcript inline. There's no streaming path through the batch endpoint — that's what /stream/transcribe is for.
1. Send the upload#
1 2 3 4 5 6 | |
1 2 3 4 5 6 7 8 9 10 11 12 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
The response is 202 Accepted:
1 2 3 4 5 6 7 8 | |
Under the hood: the audio was staged to S3 at scaiecho/transcribe_jobs/{job_id}.{ext}, a TranscriptionJob row was inserted at status='queued' with the audio sha256 and byte count, and process_transcribe_job was enqueued on the arq worker pool. The worker decides the backend after policy lookup, calls the dispatcher, writes the transcript back to the same row.
2. Poll for completion#
1 2 | |
1 2 3 4 5 6 7 8 9 10 11 12 | |
1 2 3 4 5 6 7 8 9 10 11 | |
status progresses through queued → running → completed (or failed). On completed the transcript is inline on the response — no second fetch needed. Other fields populated on completion: backend_used, language_detected, audio_duration_ms, completed_at.
3. Handle failures#
Status failed means the worker ran but the dispatcher errored. status_reason tells you why — typically one of:
- Backend unavailable. Your tenant policy pinned the job to a backend that wasn't online when the worker ran. Retry: re-upload, or change tenant policy to allow the other backend.
- Audio decode failure. The dispatcher couldn't parse the file. Check that the audio is valid and the
Content-Typeyou sent matches the actual format. - Quota exceeded. Tenant budget was hit between enqueue and dispatch. Either raise the budget or wait for the next period.
A 404 on the poll endpoint means either the job doesn't exist or it belongs to a different user/tenant. We deliberately return 404 on cross-context lookups to avoid leaking job existence.
4. Scope and retention#
Async jobs are scoped to the user that created them. Other users in the same tenant see 404 on those job ids. Tenant admins reading transcripts for compliance use the admin UI's transcription dashboard instead — it queries the same TranscriptionJob table without the per-user scope clause.
The audio blob in S3 is retained for the configured tenant retention window. The transcript stays in MariaDB until the row is reaped. Delete a job by deleting the user — there's no per-job DELETE endpoint, since transcripts are part of the audit trail.
Tuning#
- For a tenant that uploads many medium-length files in burst, raise
scaiecho_async_audio_threshold_bytesso more requests stay inline. The cost is longer-held HTTP connections. - For latency-sensitive uploads, set
force_async=trueonly when the recording is genuinely long. Inline transcription is faster end-to-end for short audio. - If your workflow needs continuous transcription rather than a finished file, switch to streaming — async jobs are for finite recordings, not live audio.