gRPC API

ScaiGrid exposes a gRPC API alongside its HTTP surface. It's used for internal integrations — ScaiInfer nodes report heartbeats, ScaiMind coordinates training jobs, high-throughput streaming scenarios — and is available to operators who need tighter integration than HTTP.

Default port: 50051 Protocol: gRPC over HTTP/2, protobuf serialization

When to use gRPC#

High-throughput streaming. Multi-GB inference outputs, long-lived connections, low per-call overhead.
Internal ScaiLabs components. ScaiInfer nodes, ScaiMind cluster nodes, worker agents — all connect via gRPC.
Low-latency RPC. Binary protobuf and persistent connections beat HTTP for hot paths.

For typical application integration, use the HTTP API. gRPC adds setup overhead that isn't justified for most use cases.

Services exposed#

ScaiInfer bridge (`scaiinfer.v1`)#

Inference nodes register and stream to the gateway via gRPC:

InferenceService.StreamInference — low-latency streaming LLM calls
InferenceService.ListModels — query a node for its loaded models
HeartbeatService.Heartbeat — node health and capacity reports

Used by the scaiinfer backend dispatcher. Not typically called directly by application code.

ScaiMind coordination (`scaimind.v1`)#

MindCoordinator → gateway, and gateway → cluster nodes:

CoordinatorService.SubmitJob — create a training job
CoordinatorService.StreamMetrics — stream training metrics in real time
ClusterService.RegisterNode — node registration
ClusterService.AllocateGPUs — GPU scheduling

Covered via REST at /v1/modules/scaimind/*. The REST layer is a thin bridge — the authoritative API is the gRPC one.

ScaiCore runtime (`scaicore.v1`)#

Core runtime internals:

CoreRuntimeService.InvokeCore, PassivateCore, RestoreCore — instance lifecycle
CoreRuntimeService.StreamEvents — real-time event stream from a running core

Exposed for observability tools that need richer-than-REST event feeds.

Authentication#

gRPC auth uses a service token passed as metadata:

text

1	`authorization: Bearer <service-token>`

Service tokens are issued by ScaiKey and scoped to specific services. They're not interchangeable with ScaiGrid API keys.

Inside a cluster, mTLS is the recommended additional layer. See Deployment for mTLS setup.

Proto files#

The proto definitions live in the ScaiGrid source tree at proto/. For external integrations, we publish generated stubs for common languages at a dedicated package registry (ask your ScaiGrid support contact for access).

Service signatures are versioned — scaiinfer.v1, scaimind.v1. Backwards-incompatible changes introduce a new version (v2); v1 stays supported until all clients migrate.

Example: a minimal client#

python
import grpc
from scaiinfer.v1 import scaiinfer_pb2 as pb
from scaiinfer.v1.scaiinfer_pb2_grpc import InferenceServiceStub

channel = grpc.aio.insecure_channel("scaigrid.scailabs.ai:50051")
stub = InferenceServiceStub(channel)

metadata = [
    ("authorization", f"Bearer {SERVICE_TOKEN}"),
    ("x-request-id", "req_abc"),
]

req = pb.InferenceRequest(model="scailabs/poolnoodle-omni", ...)
async for chunk in stub.StreamInference(req, metadata=metadata):
    print(chunk.delta.content, end="")

For production, use TLS (grpc.secure_channel) and proper error handling.

Standard gRPC errors#

ScaiGrid maps its error codes to standard gRPC status codes:

gRPC Status	ScaiGrid codes
`UNAUTHENTICATED`	Auth errors
`PERMISSION_DENIED`	Permission/budget errors
`NOT_FOUND`	404 codes
`INVALID_ARGUMENT`	Validation errors
`RESOURCE_EXHAUSTED`	Rate-limited / quota-exhausted
`UNAVAILABLE`	Backend unavailable, service draining
`DEADLINE_EXCEEDED`	Timeout
`INTERNAL`	Unexpected server error

The original ScaiGrid error code is attached as metadata (scaigrid-error-code) and message for programmatic handling.

Not everything is on gRPC#

Not every REST endpoint has a gRPC equivalent. gRPC coverage is focused on:

Inference (chat, embeddings, streaming)
ScaiInfer / ScaiMind / ScaiCore internal coordination
Event streaming

Admin operations, accounting queries, user management — these stay on HTTP.

MCP Server — another binary-transport protocol for agent integration
Inference Reference — HTTP equivalents
Internal proto specs in the ScaiGrid source tree