Troubleshooting
A short list of things that go wrong and how to fix them. If none of these match, check the request id in the response envelope and grep the ScaiGrid logs.
Desktop client connects, immediately closes with 4001#
Authentication failed. The JWT in the Authorization header is missing, malformed, or expired.
- Confirm the client is sending
Authorization: Bearer <jwt>on the WebSocket upgrade. - Confirm the JWT's
audmatches the ScaiKey audience the deployment trusts. - The JWT must include a
groupsclaim so the server can resolve permissions.
Desktop client closes with 4002 right after the handshake#
Either the first frame was not scailink/session_init, or its params didn't validate.
- The very first frame after the WebSocket opens must be
session_init— not a heartbeat. device_nameis required;platform.osis required.- A malformed
capabilitiesblock fails validation; send empty arrays if there's nothing to register.
Tools/list returned 0 — server registered but capabilities is empty#
Discovery succeeded at the transport level but the server returned no tools.
- Try
POST /remote-servers/{id}/refreshto re-run discovery. - Check the upstream server actually advertises tools at the URL you registered. Sometimes the MCP path is
/mcp/v1rather than/mcp. - Some servers gate tool discovery on the credential's scope — wrong scope means an empty list rather than a 401.
Status flipped to error after working fine#
Three consecutive health checks failed.
- Check
last_health_statuson the detail view — the error class name says what went wrong (e.g.RemoteClientError,TimeoutError). - Test the endpoint from outside ScaiGrid with the same credentials.
- Fix the upstream and call
POST /remote-servers/{id}/refresh. A successful refresh resetsconsecutive_failuresto 0 andstatustoactive.
SCAILINK_REGISTRY_DISABLED on every registry call#
Returns 503. The deployment doesn't have an encryption KEK configured.
- Set
encryption_local_kekin the platform settings (production wires this through ScaiVault). - Until the KEK is set, the cloud-registry feature is intentionally off — credentials would have nowhere safe to live.
Tenant hit the registration cap#
SCAILINK_REMOTE_LIMIT_EXCEEDED at 429. The tenant has 100 registered servers.
- Delete unused servers from the dashboard. Cascades through credentials and capabilities.
- If the cap is genuinely too low, the limit is
MAX_SERVERS_PER_TENANTin code and can be bumped per deployment.
Repeat tool calls are still slow#
The session pool should make repeat invocations skip the handshake. If they don't:
- The 5-minute idle TTL has elapsed between calls — the warm session was closed.
- The previous call returned an RPC error; ScaiLink closes the cached session on any error so the next call gets a fresh handshake.
- You're hitting different workers across calls. Each worker keeps its own pool; with
uvicorn --workers N, hit rate scales 1/N. Multi-worker coordination is parked for v1.2.
Credential PUT 200's but outbound calls still 401#
The rotation went through but the upstream still rejects.
- Confirm you rotated the right field.
PUT /remote-servers/{id}/credentials/authorizationrotates theauthorizationfield; you may needx-api-keyor another. - Force a refresh after the rotation so the cached session in the pool gets evicted.
- Confirm the new token has the same scopes the registered tools need.
Audit log doesn't have the call I just made#
- Detail level on the
session_initwas set tooff— only the skeleton is recorded. Reconnect withaudit_detail_level: "metadata"to get target names and arguments. - The call failed at the routing layer (wrong tool name, no connected device) — these surface in logs but not always in the per-user audit endpoint. Use the tenant-wide
GET /audit.
A user with a custom role can't see registered servers#
Most likely the role is missing scailink:remote.use.
- The list endpoint requires
scailink:remote.useeven for personal servers the user themselves registered. - Confirm the user's effective permissions via
GET /iam/users/{id}/permissions. - The catch-all that auto-grants admin roles does not apply to
tenant_user/tenant_viewer.
"Tool not found" when invoking a registered server's tool#
- The server might be in
status='error'— its tools are hidden from the aggregated catalog. Refresh it. - The namespaced name is wrong. Personal servers are
remote.{user_id}.{slug}.{tool_name}, tenant-shared areremote.tenant.{slug}.{tool_name}. The slug includes a 6-char hash; copy from the detail view rather than constructing it. - Capability rows older than 24 hours get evicted on refresh. If a tool used to exist but no longer does, the upstream removed it.
Conversations never see a desktop client's tools#
- Confirm the desktop session is actually
active—GET /sessions. - Confirm the capabilities are registered —
GET /capabilities. - The agent surface (ScaiCore, ScaiBot, an external MCP consumer) needs to be wired through ScaiMCP, which is where ScaiLink's catalogs surface to agents.