Inference Reference

All inference endpoints. For task-oriented guides, see API Guides.

Base path: /v1/inference/ Required permission: models:use

POST /v1/inference/chat#

Chat completion. See Chat Completions for the full walk-through.

Request:

Field	Type	Required
`model`	string	Yes
`messages`	array	Yes
`max_tokens`	integer	No
`temperature`	float	No (default provider-specific, usually 1.0)
`top_p`	float	No
`stop`	string or array	No
`seed`	integer	No
`stream`	boolean	No (default false)
`tools`	array	No
`tool_choice`	string or object	No
`metadata`	object	No

Response (non-streaming): see Chat Completions.

Response (streaming): SSE stream. data: {...} chunks with choices[0].delta.content; ends with data: [DONE]. Errors arrive as event: error\ndata: {...}.

POST /v1/inference/generate#

Text generation (completion). Simpler than chat — no message roles, just a prompt.

json
{
  "prompt": "Once upon a time",
  "model": "scailabs/poolnoodle-omni",
  "max_tokens": 200,
  "temperature": 0.8,
  "stop": ["THE END"],
  "seed": 42
}

Returns a text completion. Most modern models are chat-trained; use /v1/inference/chat unless you specifically need raw text generation.

POST /v1/inference/embed#

Generate embeddings. See Embeddings.

json
{
  "model": "openai/text-embedding-3-small",
  "input": ["first text", "second text"],
  "dimensions": 1536
}

Returns a list of vectors.

POST /v1/inference/images/generate#

Generate images. See Images.

json
{
  "model": "openai/dall-e-3",
  "prompt": "A landscape painting",
  "n": 1,
  "size": "1024x1024",
  "quality": "standard",
  "style": "vivid",
  "response_format": "url"
}

POST /v1/inference/audio/transcribe#

Speech-to-text. See Audio.

Content-Type: multipart/form-data

Form fields:

Field	Type	Notes
`file`	binary	Audio file
`model`	string	Required
`language`	string	ISO-639-1
`temperature`	float	0.0–1.0
`prompt`	string	Context
`response_format`	string	`text` / `json` / `verbose_json` / `srt` / `vtt`
`timestamp_granularities`	array	`segment`, `word`

POST /v1/inference/audio/synthesize#

Text-to-speech. See Audio.

json
{
  "model": "openai/tts-1",
  "input": "Hello world",
  "voice": "alloy",
  "response_format": "mp3",
  "speed": 1.0
}

Response: raw audio bytes (not JSON envelope). Content-Type reflects the format.

Batch inference#

See Batch Inference for the complete workflow.

POST /v1/inference/batch#

Submit a batch.

json
{
  "input_file_url": "s3://...",
  "endpoint_completion_window": "24h",
  "metadata": {...}
}

GET /v1/inference/batch#

List batches. Query params: status, limit, cursor.

GET /v1/inference/batch/{batch_id}#

Get batch status and result URLs.

POST /v1/inference/batch/{batch_id}/cancel#

Cancel a batch. Completed requests are retained.

Response envelope#

All successful /v1/inference/* responses (except audio synthesis, which returns raw bytes) follow the standard envelope:

json
{
  "status": "ok",
  "data": {...},
  "meta": {"request_id": "req_..."}
}

Errors use the same envelope with status: "error" and an error object. See Errors.

Headers#