Inference Reference
All inference endpoints. For task-oriented guides, see API Guides.
Base path: /v1/inference/
Required permission: models:use
POST /v1/inference/chat#
Chat completion. See Chat Completions for the full walk-through.
Request:
| Field | Type | Required |
|---|---|---|
model |
string | Yes |
messages |
array | Yes |
max_tokens |
integer | No |
temperature |
float | No (default provider-specific, usually 1.0) |
top_p |
float | No |
stop |
string or array | No |
seed |
integer | No |
stream |
boolean | No (default false) |
tools |
array | No |
tool_choice |
string or object | No |
metadata |
object | No |
Response (non-streaming): see Chat Completions.
Response (streaming): SSE stream. data: {...} chunks with choices[0].delta.content; ends with data: [DONE]. Errors arrive as event: error\ndata: {...}.
POST /v1/inference/generate#
Text generation (completion). Simpler than chat — no message roles, just a prompt.
1 2 3 4 5 6 7 8 | |
Returns a text completion. Most modern models are chat-trained; use /v1/inference/chat unless you specifically need raw text generation.
POST /v1/inference/embed#
Generate embeddings. See Embeddings.
1 2 3 4 5 | |
Returns a list of vectors.
POST /v1/inference/images/generate#
Generate images. See Images.
1 2 3 4 5 6 7 8 9 | |
POST /v1/inference/audio/transcribe#
Speech-to-text. See Audio.
Content-Type: multipart/form-data
Form fields:
| Field | Type | Notes |
|---|---|---|
file |
binary | Audio file |
model |
string | Required |
language |
string | ISO-639-1 |
temperature |
float | 0.0–1.0 |
prompt |
string | Context |
response_format |
string | text / json / verbose_json / srt / vtt |
timestamp_granularities |
array | segment, word |
POST /v1/inference/audio/synthesize#
Text-to-speech. See Audio.
1 2 3 4 5 6 7 | |
Response: raw audio bytes (not JSON envelope). Content-Type reflects the format.
Batch inference#
See Batch Inference for the complete workflow.
POST /v1/inference/batch#
Submit a batch.
1 2 3 4 5 | |
GET /v1/inference/batch#
List batches. Query params: status, limit, cursor.
GET /v1/inference/batch/{batch_id}#
Get batch status and result URLs.
POST /v1/inference/batch/{batch_id}/cancel#
Cancel a batch. Completed requests are retained.
Response envelope#
All successful /v1/inference/* responses (except audio synthesis, which returns raw bytes) follow the standard envelope:
1 2 3 4 5 | |
Errors use the same envelope with status: "error" and an error object. See Errors.
Headers#
Request:
Authorization: Bearer <token>— requiredX-Request-ID: <id>— optional, propagates through tracing
Response:
X-Scaigrid-Request-Id: <id>— always present. Include in support requests.X-Scaigrid-Model: <slug>— the frontend model that served the requestX-Scaigrid-Backend: <id>— the backend that was actually calledRetry-After: <seconds>— present on 429 responses