Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Troubleshooting

A short list of things that go wrong and how to fix them. If none of these match, check the request id in the response envelope and grep the ScaiGrid logs.

Desktop client connects, immediately closes with 4001#

Authentication failed. The JWT in the Authorization header is missing, malformed, or expired.

  • Confirm the client is sending Authorization: Bearer <jwt> on the WebSocket upgrade.
  • Confirm the JWT's aud matches the ScaiKey audience the deployment trusts.
  • The JWT must include a groups claim so the server can resolve permissions.

Desktop client closes with 4002 right after the handshake#

Either the first frame was not scailink/session_init, or its params didn't validate.

  • The very first frame after the WebSocket opens must be session_init — not a heartbeat.
  • device_name is required; platform.os is required.
  • A malformed capabilities block fails validation; send empty arrays if there's nothing to register.

Tools/list returned 0 — server registered but capabilities is empty#

Discovery succeeded at the transport level but the server returned no tools.

  • Try POST /remote-servers/{id}/refresh to re-run discovery.
  • Check the upstream server actually advertises tools at the URL you registered. Sometimes the MCP path is /mcp/v1 rather than /mcp.
  • Some servers gate tool discovery on the credential's scope — wrong scope means an empty list rather than a 401.

Status flipped to error after working fine#

Three consecutive health checks failed.

  • Check last_health_status on the detail view — the error class name says what went wrong (e.g. RemoteClientError, TimeoutError).
  • Test the endpoint from outside ScaiGrid with the same credentials.
  • Fix the upstream and call POST /remote-servers/{id}/refresh. A successful refresh resets consecutive_failures to 0 and status to active.

Returns 503. The deployment doesn't have an encryption KEK configured.

  • Set encryption_local_kek in the platform settings (production wires this through ScaiVault).
  • Until the KEK is set, the cloud-registry feature is intentionally off — credentials would have nowhere safe to live.

Tenant hit the registration cap#

SCAILINK_REMOTE_LIMIT_EXCEEDED at 429. The tenant has 100 registered servers.

  • Delete unused servers from the dashboard. Cascades through credentials and capabilities.
  • If the cap is genuinely too low, the limit is MAX_SERVERS_PER_TENANT in code and can be bumped per deployment.

Repeat tool calls are still slow#

The session pool should make repeat invocations skip the handshake. If they don't:

  • The 5-minute idle TTL has elapsed between calls — the warm session was closed.
  • The previous call returned an RPC error; ScaiLink closes the cached session on any error so the next call gets a fresh handshake.
  • You're hitting different workers across calls. Each worker keeps its own pool; with uvicorn --workers N, hit rate scales 1/N. Multi-worker coordination is parked for v1.2.

Credential PUT 200's but outbound calls still 401#

The rotation went through but the upstream still rejects.

  • Confirm you rotated the right field. PUT /remote-servers/{id}/credentials/authorization rotates the authorization field; you may need x-api-key or another.
  • Force a refresh after the rotation so the cached session in the pool gets evicted.
  • Confirm the new token has the same scopes the registered tools need.

Audit log doesn't have the call I just made#

  • Detail level on the session_init was set to off — only the skeleton is recorded. Reconnect with audit_detail_level: "metadata" to get target names and arguments.
  • The call failed at the routing layer (wrong tool name, no connected device) — these surface in logs but not always in the per-user audit endpoint. Use the tenant-wide GET /audit.

A user with a custom role can't see registered servers#

Most likely the role is missing scailink:remote.use.

  • The list endpoint requires scailink:remote.use even for personal servers the user themselves registered.
  • Confirm the user's effective permissions via GET /iam/users/{id}/permissions.
  • The catch-all that auto-grants admin roles does not apply to tenant_user / tenant_viewer.

"Tool not found" when invoking a registered server's tool#

  • The server might be in status='error' — its tools are hidden from the aggregated catalog. Refresh it.
  • The namespaced name is wrong. Personal servers are remote.{user_id}.{slug}.{tool_name}, tenant-shared are remote.tenant.{slug}.{tool_name}. The slug includes a 6-char hash; copy from the detail view rather than constructing it.
  • Capability rows older than 24 hours get evicted on refresh. If a tool used to exist but no longer does, the upstream removed it.

Conversations never see a desktop client's tools#

  • Confirm the desktop session is actually activeGET /sessions.
  • Confirm the capabilities are registered — GET /capabilities.
  • The agent surface (ScaiCore, ScaiBot, an external MCP consumer) needs to be wired through ScaiMCP, which is where ScaiLink's catalogs surface to agents.
Updated 2026-05-18 15:01:29 View source (.md) rev 12