Your central AI routing layer
Every AI-powered application needs to talk to a model somewhere. ScaiGrid is the layer between your applications and those models - whether they run on your own GPUs, on ScaiLabs’ infrastructure, or at external providers.
Instead of wiring directly to a provider, you connect to ScaiGrid. Define frontend models that your applications see, and ScaiGrid handles the backend: routing, load balancing, failover, token accounting, and cost optimisation - all transparently.
Key capabilities
REST API
Full access to every capability: managed chat, asset management, personas, and configuration.
gRPC API
High-speed, low-latency for performance-critical workloads. Same capabilities, reduced overhead.
OpenAI-compatible
Drop-in replacement. Change your base URL and you’re on ScaiGrid. Zero code changes.
WebSocket
Real-time streaming for chat completions, agent responses, and live data.
Token accounting
Per-user, per-tenant, per-model usage tracking. Real-time dashboards and billing API.
Model management
Frontend/backend model abstraction, version pinning, A/B testing, and fallback chains.
Module system
ScaiMatrix
Vector store and semantic search. Powers RAG for any application connected to ScaiGrid.
ScaiCore Runtime
Agent deployment and lifecycle management. Exposes agents as models through ScaiGrid’s API.
ScaiMind
Training and fine-tuning orchestration. Launch jobs, monitor progress, auto-register results.
Custom modules
Build your own modules using ScaiGrid’s module API. Extend capabilities without forking the platform.
Architecture
ScaiGrid in practice
Enterprise AI gateway
Route all AI usage through a single point with consistent authentication, accounting, and model governance.
AI-as-a-Service
Service providers use ScaiGrid to offer AI to customers with per-tenant isolation and billing.
Multi-model orchestration
Route different tasks to different models based on complexity, cost, and latency requirements.
Sovereign AI
Run on your own infrastructure with ScaiInfer nodes. No data leaves your premises.
ScaiInfer - Inference Compute
ScaiInfer provides the GPU-powered compute backbone for model inference. Nodes register with ScaiGrid and handle actual execution - loading models, processing requests, returning results.
Flexible deployment: ScaiLabs cloud, partner data centres, or on-premises. ScaiGrid routes transparently.
- GPU-optimised inference with automatic scaling
- Health monitoring and capacity reporting
- Multiple concurrent models per node
ScaiAtlas - Model Registry
The central catalog for every AI model. Full versioning, metadata, compatibility tracking, and deployment coordination to ScaiInfer nodes.
- Model storage with version history
- Rich metadata: architecture, capabilities, hardware requirements
- Deployment coordination and discovery
ScaiMind - Training & Fine-Tuning
Orchestrates model training and fine-tuning across GPU infrastructure. LoRA, QLoRA, and full fine-tuning with the entire lifecycle managed: dataset prep, scheduling, monitoring, evaluation, and auto-registration in ScaiAtlas.
From training to production
Train
ScaiMind orchestrates training on GPU nodes.
Register
Models cataloged in ScaiAtlas with full metadata.
Deploy
ScaiAtlas coordinates deployment to ScaiInfer nodes.
Serve
ScaiGrid routes requests to optimal nodes.