Your First Integration
The quickstart showed you how to fire a single request. This walks through a real integration: structured error handling, retries, streaming, and usage tracking.
We'll build a small "title generator" — given a short conversation snippet, produce a 6-word title. You'd want this behind a user-facing chat app.
Setup
| export SCAIGRID_API_KEY=sgk_your_key_here
export SCAIGRID_BASE_URL=https://scaigrid.scailabs.ai
|
| # pip install httpx
import os
import httpx
API_KEY = os.environ["SCAIGRID_API_KEY"]
BASE_URL = os.environ.get("SCAIGRID_BASE_URL", "https://scaigrid.scailabs.ai")
|
| // npm install
const API_KEY = process.env.SCAIGRID_API_KEY!;
const BASE_URL = process.env.SCAIGRID_BASE_URL ?? "https://scaigrid.scailabs.ai";
|
A real request with error handling
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38 | import httpx
class ScaiGridError(Exception):
def __init__(self, code: str, message: str, retry_after: float | None = None):
self.code = code
self.message = message
self.retry_after = retry_after
super().__init__(f"{code}: {message}")
def chat(model: str, messages: list[dict], **params) -> dict:
resp = httpx.post(
f"{BASE_URL}/v1/inference/chat",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"model": model, "messages": messages, **params},
timeout=60.0,
)
body = resp.json()
if body.get("status") == "error":
err = body["error"]
raise ScaiGridError(
code=err["code"],
message=err["message"],
retry_after=err.get("retry_after"),
)
return body["data"]
result = chat(
model="scailabs/poolnoodle-omni",
messages=[
{"role": "system", "content": "You generate conversation titles. Reply with ONLY the title, max 6 words."},
{"role": "user", "content": "Hey, what's a good recipe for carbonara?"},
],
max_tokens=20,
temperature=0.3,
)
print(result["choices"][0]["message"]["content"])
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44 | interface ChatResult {
choices: { message: { role: string; content: string } }[];
usage: { prompt_tokens: number; completion_tokens: number; total_tokens: number };
}
interface ScaiGridError {
code: string;
message: string;
retry_after?: number;
}
async function chat(
model: string,
messages: Array<{ role: string; content: string }>,
params: Record<string, unknown> = {},
): Promise<ChatResult> {
const resp = await fetch(`${BASE_URL}/v1/inference/chat`, {
method: "POST",
headers: {
"Authorization": `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ model, messages, ...params }),
});
const body = await resp.json();
if (body.status === "error") {
const err = body.error as ScaiGridError;
const e = new Error(`${err.code}: ${err.message}`);
(e as any).code = err.code;
(e as any).retryAfter = err.retry_after;
throw e;
}
return body.data as ChatResult;
}
const result = await chat(
"scailabs/poolnoodle-omni",
[
{ role: "system", content: "You generate conversation titles. Reply with ONLY the title, max 6 words." },
{ role: "user", content: "Hey, what's a good recipe for carbonara?" },
],
{ max_tokens: 20, temperature: 0.3 },
);
console.log(result.choices[0].message.content);
|
Expected output: something like Classic Carbonara Recipe Request.
Retries
ScaiGrid maps upstream failures to specific error codes. Some are retryable, some are not.
| Code |
HTTP |
Retry? |
BACKEND_RATE_LIMITED |
429 |
Yes — honor retry_after |
BACKEND_TIMEOUT |
504 |
Yes — with exponential backoff |
BACKEND_ERROR |
502 |
Yes — once or twice |
BUDGET_EXCEEDED |
429 |
No — admin intervention needed |
MODEL_UNAVAILABLE |
503 |
Sometimes — the model might come back |
UPSTREAM_SHAPE_MISMATCH |
502 |
No — gateway integration bug |
AUTH_TOKEN_INVALID |
401 |
No — fix your credentials |
AUTHZ_PERMISSION_DENIED |
403 |
No — not a transient problem |
A minimal retry helper:
1
2
3
4
5
6
7
8
9
10
11
12
13
14 | import time
RETRYABLE = {"BACKEND_RATE_LIMITED", "BACKEND_TIMEOUT", "BACKEND_ERROR", "MODEL_UNAVAILABLE"}
def chat_with_retry(model: str, messages: list, max_attempts: int = 3, **params) -> dict:
for attempt in range(max_attempts):
try:
return chat(model, messages, **params)
except ScaiGridError as e:
if e.code not in RETRYABLE or attempt == max_attempts - 1:
raise
delay = e.retry_after if e.retry_after else (2 ** attempt)
time.sleep(delay)
raise RuntimeError("unreachable")
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 | const RETRYABLE = new Set([
"BACKEND_RATE_LIMITED", "BACKEND_TIMEOUT", "BACKEND_ERROR", "MODEL_UNAVAILABLE",
]);
async function chatWithRetry(
model: string,
messages: any[],
params: Record<string, unknown> = {},
maxAttempts = 3,
): Promise<ChatResult> {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
try {
return await chat(model, messages, params);
} catch (e: any) {
if (!RETRYABLE.has(e.code) || attempt === maxAttempts - 1) throw e;
const delay = (e.retryAfter ?? Math.pow(2, attempt)) * 1000;
await new Promise(r => setTimeout(r, delay));
}
}
throw new Error("unreachable");
}
|
Tracking usage
Every response includes a usage object with token counts. For longer-lived integrations you usually want to sum these to measure cost.
| total_prompt = 0
total_completion = 0
def track(result):
global total_prompt, total_completion
usage = result["usage"]
total_prompt += usage["prompt_tokens"]
total_completion += usage["completion_tokens"]
|
For authoritative usage across your whole tenant, query the accounting API directly:
| curl "$SCAIGRID_BASE_URL/v1/accounting/usage/summary?period=day" \
-H "Authorization: Bearer $SCAIGRID_API_KEY"
|
See Accounting and Budgets.
Using the request ID for support
Every response has a request ID in the X-Scaigrid-Request-Id header (and in meta.request_id in the body). If something goes wrong and you need to contact support, that ID lets us look up the full request trace in seconds.
| resp = httpx.post(...)
print("Request ID:", resp.headers.get("X-Scaigrid-Request-Id"))
|
| const resp = await fetch(...);
console.log("Request ID:", resp.headers.get("X-Scaigrid-Request-Id"));
|
Log this on every request in production. When a user reports an issue, the request ID is the fastest path to answers.
What's next