Your First Integration

The quickstart showed you how to fire a single request. This walks through a real integration: structured error handling, retries, streaming, and usage tracking.

We'll build a small "title generator" — given a short conversation snippet, produce a 6-word title. You'd want this behind a user-facing chat app.

Setup#

bash
export SCAIGRID_API_KEY=sgk_your_key_here
export SCAIGRID_BASE_URL=https://scaigrid.scailabs.ai

python
# pip install httpx
import os
import httpx

API_KEY = os.environ["SCAIGRID_API_KEY"]
BASE_URL = os.environ.get("SCAIGRID_BASE_URL", "https://scaigrid.scailabs.ai")

typescript
// npm install
const API_KEY = process.env.SCAIGRID_API_KEY!;
const BASE_URL = process.env.SCAIGRID_BASE_URL ?? "https://scaigrid.scailabs.ai";

A real request with error handling#

python
import httpx

class ScaiGridError(Exception):
    def __init__(self, code: str, message: str, retry_after: float | None = None):
        self.code = code
        self.message = message
        self.retry_after = retry_after
        super().__init__(f"{code}: {message}")


def chat(model: str, messages: list[dict], **params) -> dict:
    resp = httpx.post(
        f"{BASE_URL}/v1/inference/chat",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"model": model, "messages": messages, **params},
        timeout=60.0,
    )
    body = resp.json()
    if body.get("status") == "error":
        err = body["error"]
        raise ScaiGridError(
            code=err["code"],
            message=err["message"],
            retry_after=err.get("retry_after"),
        )
    return body["data"]


result = chat(
    model="scailabs/poolnoodle-omni",
    messages=[
        {"role": "system", "content": "You generate conversation titles. Reply with ONLY the title, max 6 words."},
        {"role": "user", "content": "Hey, what's a good recipe for carbonara?"},
    ],
    max_tokens=20,
    temperature=0.3,
)
print(result["choices"][0]["message"]["content"])

typescript
interface ChatResult {
  choices: { message: { role: string; content: string } }[];
  usage: { prompt_tokens: number; completion_tokens: number; total_tokens: number };
}

interface ScaiGridError {
  code: string;
  message: string;
  retry_after?: number;
}

async function chat(
  model: string,
  messages: Array<{ role: string; content: string }>,
  params: Record<string, unknown> = {},
): Promise<ChatResult> {
  const resp = await fetch(`${BASE_URL}/v1/inference/chat`, {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ model, messages, ...params }),
  });
  const body = await resp.json();
  if (body.status === "error") {
    const err = body.error as ScaiGridError;
    const e = new Error(`${err.code}: ${err.message}`);
    (e as any).code = err.code;
    (e as any).retryAfter = err.retry_after;
    throw e;
  }
  return body.data as ChatResult;
}

const result = await chat(
  "scailabs/poolnoodle-omni",
  [
    { role: "system", content: "You generate conversation titles. Reply with ONLY the title, max 6 words." },
    { role: "user", content: "Hey, what's a good recipe for carbonara?" },
  ],
  { max_tokens: 20, temperature: 0.3 },
);
console.log(result.choices[0].message.content);

Expected output: something like Classic Carbonara Recipe Request.

Retries#

ScaiGrid maps upstream failures to specific error codes. Some are retryable, some are not.

Code	HTTP	Retry?
`BACKEND_RATE_LIMITED`	429	Yes — honor `retry_after`
`BACKEND_TIMEOUT`	504	Yes — with exponential backoff
`BACKEND_ERROR`	502	Yes — once or twice
`BUDGET_EXCEEDED`	429	No — admin intervention needed
`MODEL_UNAVAILABLE`	503	Sometimes — the model might come back
`UPSTREAM_SHAPE_MISMATCH`	502	No — gateway integration bug
`AUTH_TOKEN_INVALID`	401	No — fix your credentials
`AUTHZ_PERMISSION_DENIED`	403	No — not a transient problem

A minimal retry helper:

python
import time

RETRYABLE = {"BACKEND_RATE_LIMITED", "BACKEND_TIMEOUT", "BACKEND_ERROR", "MODEL_UNAVAILABLE"}

def chat_with_retry(model: str, messages: list, max_attempts: int = 3, **params) -> dict:
    for attempt in range(max_attempts):
        try:
            return chat(model, messages, **params)
        except ScaiGridError as e:
            if e.code not in RETRYABLE or attempt == max_attempts - 1:
                raise
            delay = e.retry_after if e.retry_after else (2 ** attempt)
            time.sleep(delay)
    raise RuntimeError("unreachable")

typescript
const RETRYABLE = new Set([
  "BACKEND_RATE_LIMITED", "BACKEND_TIMEOUT", "BACKEND_ERROR", "MODEL_UNAVAILABLE",
]);

async function chatWithRetry(
  model: string,
  messages: any[],
  params: Record<string, unknown> = {},
  maxAttempts = 3,
): Promise<ChatResult> {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      return await chat(model, messages, params);
    } catch (e: any) {
      if (!RETRYABLE.has(e.code) || attempt === maxAttempts - 1) throw e;
      const delay = (e.retryAfter ?? Math.pow(2, attempt)) * 1000;
      await new Promise(r => setTimeout(r, delay));
    }
  }
  throw new Error("unreachable");
}

Tracking usage#

Every response includes a usage object with token counts. For longer-lived integrations you usually want to sum these to measure cost.

python
total_prompt = 0
total_completion = 0

def track(result):
    global total_prompt, total_completion
    usage = result["usage"]
    total_prompt += usage["prompt_tokens"]
    total_completion += usage["completion_tokens"]

For authoritative usage across your whole tenant, query the accounting API directly:

bash
curl "$SCAIGRID_BASE_URL/v1/accounting/usage/summary?period=day" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"

See Accounting and Budgets.

Using the request ID for support#

Every response has a request ID in the X-Scaigrid-Request-Id header (and in meta.request_id in the body). If something goes wrong and you need to contact support, that ID lets us look up the full request trace in seconds.

python
resp = httpx.post(...)
print("Request ID:", resp.headers.get("X-Scaigrid-Request-Id"))

typescript
const resp = await fetch(...);
console.log("Request ID:", resp.headers.get("X-Scaigrid-Request-Id"));

Log this on every request in production. When a user reports an issue, the request ID is the fastest path to answers.

What's next#

Chat Completions — streaming, tool calls, multimodal.
Models and Routing — how to pick the right model for a task.
Errors — full error code reference.