Models and Routing
ScaiGrid separates three concerns:
- Frontend models — the stable public names your application asks for.
- Backend models — the actual upstream deployments that answer requests.
- Routing policies — the rules that map one to the other.
Your code only ever names a frontend model. Operators reshape what's behind that name — swapping providers, adjusting weights, configuring failover — without touching your code.
Frontend models#
A frontend model is an abstract identity. It has a slug (scailabs/poolnoodle-omni), a display name, a modality (chat, embedding, image, audio), capabilities, and optional defaults (context window, max output tokens, system prompt template, pricing). It may carry metadata (persona assignment, compliance flags).
Think of it as the menu item your application orders from. The kitchen decides how to prepare it.
List frontend models:
1 | |
Create one:
1 2 3 4 5 6 7 8 9 10 | |
Slug conventions:
openai/gpt-4o— platform-level, OpenAI-providedscailabs/poolnoodle-omni— platform-level, ScaiLabs-providedpartner/{partner_slug}/...— partner-scopedtenant/{tenant_slug}/...— tenant-scoped
Backend models#
A backend model is a concrete, routable endpoint. It has a URI (openai:gpt-4o or scaiinfer:node-eu-01/llama-3.3), a provider type, credentials (stored encrypted), capabilities, and health status.
A backend is specific: it points at one provider, one deployment, one region. When ScaiGrid calls it, it knows exactly where the request goes.
List backends:
1 | |
Register a backend:
1 2 3 4 5 6 7 8 9 | |
Supported provider types: openai, anthropic, azure, mistral, qwen, google, scaiinfer (our distributed inference cluster), custom (any OpenAI-protocol-compatible endpoint).
Providers#
A provider is a group of backends sharing configuration — an OpenAI account, an Azure subscription, a ScaiInfer cluster. Providers have discovery endpoints (POST /v1/providers/{id}/discover) that list what models are available from that provider, so you can add backends without hand-typing identifiers.
1 2 3 4 5 6 7 8 | |
Routing policies#
A routing policy decides, for each request to a frontend model, which backend to call. Policies are named and reusable — multiple frontend models can share the same policy.
Simplest case: a frontend model maps 1:1 to a backend. No policy needed — the mapping itself carries weight 100, priority 1.
Common multi-backend patterns:
Weighted round-robin. Two backends, 70/30 split.
1 2 3 4 5 6 | |
Primary + failover. Use backend A unless it's unhealthy, then backend B.
1 2 3 4 5 6 | |
Higher priority number = later fallback. Within a priority tier, traffic splits by weight.
Map a frontend to backends:
1 2 3 4 5 6 7 8 9 | |
Model access control#
Frontend models are listed per-tenant, but not every tenant should see every model. Use model access policies to scope visibility:
1 2 3 4 5 6 7 8 9 | |
This disables openai/gpt-4o for tenant_acme. Without an explicit entry, models are implicitly allowed.
Model groups#
For grant-in-bulk scenarios, group models:
1 2 3 4 5 6 7 | |
Then grant or deny the whole group at once via /v1/model-access with model_group_id.
Health and circuit breaking#
Each backend has a health status: healthy, degraded, unhealthy, unavailable. ScaiGrid tracks failures per-backend and opens a circuit breaker after repeated errors. An unhealthy backend is skipped until its circuit closes (on successful probe requests).
Health checks: GET /v1/backends/{backend_id}/health.
Provider discovery#
Rather than typing out every model from a provider, ask ScaiGrid to discover them:
1 2 | |
Returns a list of available upstream models. You can selectively configure some as backends:
1 2 | |
What's next#
- Chat Completions — calling models from your app.
- Models and Routing Reference — full endpoint list.
- Rate Limiting — per-key/user/tenant limits.