---
summary: Upload a long recording, get a 202 + job_id, poll until the transcript is
  ready, handle failures.
title: Transcribe long audio with async jobs
path: tutorials/transcribe-long-audio
status: published
---

The `POST /transcribe` endpoint runs short audio inline and long audio asynchronously. This tutorial walks through the async path end-to-end — what triggers it, what `202` looks like, how to poll, and how to recover from failures.

## When does async kick in

Two conditions either of which triggers the async path:

- The uploaded file is larger than `scaiecho_async_audio_threshold_bytes`. The platform default is 5 MiB — roughly five minutes of 16 kHz mono PCM. Operators can change it per deployment.
- You set `force_async=true` on the multipart form. Use this when you know the audio will exceed the inline budget despite being under the byte threshold — for example, a heavily compressed recording with a long real-time duration.

Anything that doesn't trip either condition returns the transcript inline. There's no streaming path through the batch endpoint — that's what `/stream/transcribe` is for.

## 1. Send the upload

```bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaiecho/transcribe" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "file=@long-recording.wav" \
  -F "language_hint=en" \
  -F "backend_preference=any" \
  -F "force_async=true"
```

```python
import httpx, os

with open("long-recording.wav", "rb") as f:
    resp = httpx.post(
        f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaiecho/transcribe",
        headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
        files={"file": ("long-recording.wav", f, "audio/wav")},
        data={"language_hint": "en", "force_async": "true"},
        timeout=120.0,
    )
job = resp.json()["data"]
print(job["job_id"], job["status"])  # → ... queued
```

```javascript
import fs from "node:fs";

const form = new FormData();
form.append("file", new Blob([fs.readFileSync("long-recording.wav")]), "long-recording.wav");
form.append("language_hint", "en");
form.append("force_async", "true");

const res = await fetch(`${process.env.SCAIGRID_HOST}/v1/modules/scaiecho/transcribe`, {
  method: "POST",
  headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` },
  body: form,
});
const { data: job } = await res.json();
console.log(job.job_id, job.status);
```

The response is `202 Accepted`:

```json
{
  "data": {
    "job_id": "8f3a...",
    "status": "queued",
    "audio_bytes": 14580000,
    "note": "Long-form transcribe runs asynchronously. Poll GET /v1/modules/scaiecho/transcribe/jobs/{id} for status + transcript."
  }
}
```

Under the hood: the audio was staged to S3 at `scaiecho/transcribe_jobs/{job_id}.{ext}`, a `TranscriptionJob` row was inserted at `status='queued'` with the audio sha256 and byte count, and `process_transcribe_job` was enqueued on the arq worker pool. The worker decides the backend after policy lookup, calls the dispatcher, writes the transcript back to the same row.

## 2. Poll for completion

```bash
curl "$SCAIGRID_HOST/v1/modules/scaiecho/transcribe/jobs/$JOB_ID" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"
```

```python
import time

while True:
    r = httpx.get(
        f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaiecho/transcribe/jobs/{job['job_id']}",
        headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    )
    job = r.json()["data"]
    if job["status"] in ("completed", "failed"):
        break
    time.sleep(2)
print(job["transcript"] or job["status_reason"])
```

```javascript
async function pollJob(jobId) {
  while (true) {
    const r = await fetch(
      `${process.env.SCAIGRID_HOST}/v1/modules/scaiecho/transcribe/jobs/${jobId}`,
      { headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` } },
    );
    const { data } = await r.json();
    if (data.status === "completed" || data.status === "failed") return data;
    await new Promise((r) => setTimeout(r, 2000));
  }
}
```

`status` progresses through `queued` → `running` → `completed` (or `failed`). On `completed` the transcript is inline on the response — no second fetch needed. Other fields populated on completion: `backend_used`, `language_detected`, `audio_duration_ms`, `completed_at`.

## 3. Handle failures

Status `failed` means the worker ran but the dispatcher errored. `status_reason` tells you why — typically one of:

- Backend unavailable. Your tenant policy pinned the job to a backend that wasn't online when the worker ran. Retry: re-upload, or change tenant policy to allow the other backend.
- Audio decode failure. The dispatcher couldn't parse the file. Check that the audio is valid and the `Content-Type` you sent matches the actual format.
- Quota exceeded. Tenant budget was hit between enqueue and dispatch. Either raise the budget or wait for the next period.

A 404 on the poll endpoint means either the job doesn't exist or it belongs to a different user/tenant. We deliberately return 404 on cross-context lookups to avoid leaking job existence.

## 4. Scope and retention

Async jobs are scoped to the user that created them. Other users in the same tenant see 404 on those job ids. Tenant admins reading transcripts for compliance use the admin UI's transcription dashboard instead — it queries the same `TranscriptionJob` table without the per-user scope clause.

The audio blob in S3 is retained for the configured tenant retention window. The transcript stays in MariaDB until the row is reaped. Delete a job by deleting the user — there's no per-job DELETE endpoint, since transcripts are part of the audit trail.

## Tuning

- For a tenant that uploads many medium-length files in burst, raise `scaiecho_async_audio_threshold_bytes` so more requests stay inline. The cost is longer-held HTTP connections.
- For latency-sensitive uploads, set `force_async=true` only when the recording is genuinely long. Inline transcription is faster end-to-end for short audio.
- If your workflow needs continuous transcription rather than a finished file, switch to [streaming](../concepts/streaming-transports) — async jobs are for finite recordings, not live audio.
