Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Inference Reference

All inference endpoints. For task-oriented guides, see API Guides.

Base path: /v1/inference/ Required permission: models:use

POST /v1/inference/chat#

Chat completion. See Chat Completions for the full walk-through.

Request:

Field Type Required
model string Yes
messages array Yes
max_tokens integer No
temperature float No (default provider-specific, usually 1.0)
top_p float No
stop string or array No
seed integer No
stream boolean No (default false)
tools array No
tool_choice string or object No
metadata object No

Response (non-streaming): see Chat Completions.

Response (streaming): SSE stream. data: {...} chunks with choices[0].delta.content; ends with data: [DONE]. Errors arrive as event: error\ndata: {...}.

POST /v1/inference/generate#

Text generation (completion). Simpler than chat — no message roles, just a prompt.

json
1
2
3
4
5
6
7
8
{
  "prompt": "Once upon a time",
  "model": "scailabs/poolnoodle-omni",
  "max_tokens": 200,
  "temperature": 0.8,
  "stop": ["THE END"],
  "seed": 42
}

Returns a text completion. Most modern models are chat-trained; use /v1/inference/chat unless you specifically need raw text generation.

POST /v1/inference/embed#

Generate embeddings. See Embeddings.

json
1
2
3
4
5
{
  "model": "openai/text-embedding-3-small",
  "input": ["first text", "second text"],
  "dimensions": 1536
}

Returns a list of vectors.

POST /v1/inference/images/generate#

Generate images. See Images.

json
1
2
3
4
5
6
7
8
9
{
  "model": "openai/dall-e-3",
  "prompt": "A landscape painting",
  "n": 1,
  "size": "1024x1024",
  "quality": "standard",
  "style": "vivid",
  "response_format": "url"
}

POST /v1/inference/audio/transcribe#

Speech-to-text. See Audio.

Content-Type: multipart/form-data

Form fields:

Field Type Notes
file binary Audio file
model string Required
language string ISO-639-1
temperature float 0.0–1.0
prompt string Context
response_format string text / json / verbose_json / srt / vtt
timestamp_granularities array segment, word

POST /v1/inference/audio/synthesize#

Text-to-speech. See Audio.

json
1
2
3
4
5
6
7
{
  "model": "openai/tts-1",
  "input": "Hello world",
  "voice": "alloy",
  "response_format": "mp3",
  "speed": 1.0
}

Response: raw audio bytes (not JSON envelope). Content-Type reflects the format.

Batch inference#

See Batch Inference for the complete workflow.

POST /v1/inference/batch#

Submit a batch.

json
1
2
3
4
5
{
  "input_file_url": "s3://...",
  "endpoint_completion_window": "24h",
  "metadata": {...}
}

GET /v1/inference/batch#

List batches. Query params: status, limit, cursor.

GET /v1/inference/batch/{batch_id}#

Get batch status and result URLs.

POST /v1/inference/batch/{batch_id}/cancel#

Cancel a batch. Completed requests are retained.

Response envelope#

All successful /v1/inference/* responses (except audio synthesis, which returns raw bytes) follow the standard envelope:

json
1
2
3
4
5
{
  "status": "ok",
  "data": {...},
  "meta": {"request_id": "req_..."}
}

Errors use the same envelope with status: "error" and an error object. See Errors.

Headers#

Request:

  • Authorization: Bearer <token> — required
  • X-Request-ID: <id> — optional, propagates through tracing

Response:

  • X-Scaigrid-Request-Id: <id> — always present. Include in support requests.
  • X-Scaigrid-Model: <slug> — the frontend model that served the request
  • X-Scaigrid-Backend: <id> — the backend that was actually called
  • Retry-After: <seconds> — present on 429 responses
Updated 2026-05-18 15:01:29 View source (.md) rev 17