gRPC API
ScaiGrid exposes a gRPC API alongside its HTTP surface. It's used for internal integrations — ScaiInfer nodes report heartbeats, ScaiMind coordinates training jobs, high-throughput streaming scenarios — and is available to operators who need tighter integration than HTTP.
Default port: 50051
Protocol: gRPC over HTTP/2, protobuf serialization
When to use gRPC#
- High-throughput streaming. Multi-GB inference outputs, long-lived connections, low per-call overhead.
- Internal ScaiLabs components. ScaiInfer nodes, ScaiMind cluster nodes, worker agents — all connect via gRPC.
- Low-latency RPC. Binary protobuf and persistent connections beat HTTP for hot paths.
For typical application integration, use the HTTP API. gRPC adds setup overhead that isn't justified for most use cases.
Services exposed#
ScaiInfer bridge (scaiinfer.v1)#
Inference nodes register and stream to the gateway via gRPC:
InferenceService.StreamInference— low-latency streaming LLM callsInferenceService.ListModels— query a node for its loaded modelsHeartbeatService.Heartbeat— node health and capacity reports
Used by the scaiinfer backend dispatcher. Not typically called directly by application code.
ScaiMind coordination (scaimind.v1)#
MindCoordinator → gateway, and gateway → cluster nodes:
CoordinatorService.SubmitJob— create a training jobCoordinatorService.StreamMetrics— stream training metrics in real timeClusterService.RegisterNode— node registrationClusterService.AllocateGPUs— GPU scheduling
Covered via REST at /v1/modules/scaimind/*. The REST layer is a thin bridge — the authoritative API is the gRPC one.
ScaiCore runtime (scaicore.v1)#
Core runtime internals:
CoreRuntimeService.InvokeCore,PassivateCore,RestoreCore— instance lifecycleCoreRuntimeService.StreamEvents— real-time event stream from a running core
Exposed for observability tools that need richer-than-REST event feeds.
Authentication#
gRPC auth uses a service token passed as metadata:
1 | |
Service tokens are issued by ScaiKey and scoped to specific services. They're not interchangeable with ScaiGrid API keys.
Inside a cluster, mTLS is the recommended additional layer. See Deployment for mTLS setup.
Proto files#
The proto definitions live in the ScaiGrid source tree at proto/. For external integrations, we publish generated stubs for common languages at a dedicated package registry (ask your ScaiGrid support contact for access).
Service signatures are versioned — scaiinfer.v1, scaimind.v1. Backwards-incompatible changes introduce a new version (v2); v1 stays supported until all clients migrate.
Example: a minimal client#
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
For production, use TLS (grpc.secure_channel) and proper error handling.
Standard gRPC errors#
ScaiGrid maps its error codes to standard gRPC status codes:
| gRPC Status | ScaiGrid codes |
|---|---|
UNAUTHENTICATED |
Auth errors |
PERMISSION_DENIED |
Permission/budget errors |
NOT_FOUND |
404 codes |
INVALID_ARGUMENT |
Validation errors |
RESOURCE_EXHAUSTED |
Rate-limited / quota-exhausted |
UNAVAILABLE |
Backend unavailable, service draining |
DEADLINE_EXCEEDED |
Timeout |
INTERNAL |
Unexpected server error |
The original ScaiGrid error code is attached as metadata (scaigrid-error-code) and message for programmatic handling.
Not everything is on gRPC#
Not every REST endpoint has a gRPC equivalent. gRPC coverage is focused on:
- Inference (chat, embeddings, streaming)
- ScaiInfer / ScaiMind / ScaiCore internal coordination
- Event streaming
Admin operations, accounting queries, user management — these stay on HTTP.
Related#
- MCP Server — another binary-transport protocol for agent integration
- Inference Reference — HTTP equivalents
- Internal proto specs in the ScaiGrid source tree