Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Troubleshooting

A short list of things that go wrong with the ScaiGrid wrapper and how to fix them. For language-level problems (compiler errors, IR validation, DSL semantics), see /docs/scaicore — those are not in this module.

parse-bundle rejects my file#

  • "File is neither a valid .scaicore-ir bundle nor JSON" — binary bundles must start with the SCIR magic bytes; JSON dumps must be valid JSON at the top level. If you exported the bundle via the standalone runtime, double-check you copied the binary file (not its hex dump).
  • "Bundle exceeds 10 MB limit" — the wrapper caps at 10 MB. Trim embedded assets (avatar, vendored files) and re-export.
  • "Failed to parse bundle: ..." — IR-level deserialization failure. Re-export from a current compiler; the wrapper uses ScaiCore's IRSerializer.deserialize.

Core won't start — INVALID_SOURCE_IR#

The Core's persisted IR didn't pass validate_source on start. Usually one of:

  • The IR was edited by hand into an inconsistent shape.
  • A migration changed the IR schema and your bundle is from an old compiler.

Re-export the bundle, parse it via /parse-bundle, then PUT /cores/{id} with the new source.

Core stuck in error state#

POST /cores/{id}/start transitions to error if security validation, environment resolution, or IR validation throws. Check the request id from the error response, then grep the ScaiGrid logs for core.env_decrypt_failed, core.skills_resolve_error, or any explicit raised exception.

The legal transition out of error is back to starting (i.e. retry by calling /start again after fixing the root cause). You cannot delete an error Core without resolving the row first.

Start succeeds but the Core never does anything#

Check the runtime mode. In event_driven mode the Core idles until you send it events via /cores/{id}/events. In interactive mode the Core needs an incoming chat request through /v1/inference/chat (which requires publish). In model mode same as interactive. In api mode it waits for /cores/{id}/api/invoke/{path} — which is currently NOT_IMPLEMENTED, so api Cores can't be triggered externally yet.

Checkpoints never appear#

  • The IR may not contain any checkpoint blocks. Run the bundle locally first; if no checkpoints fire there either, the program never pauses.
  • checkpoint_mode: routed drops checkpoints with assignee_type = unrouted. Switch to auto while you debug, or set an explicit assignee in the program.
  • The 5-minute expire_checkpoints cron may have auto-cancelled rows with stale expires_at. Check /checkpoints/all for expired / cancelled rows.

Checkpoint resolution silently doesn't resume the program#

  • Confirm the row's status is now resolved and resolution is non-null.
  • The wrapper writes the row; the runtime is responsible for picking up the resolution and resuming. If the engine has been unregistered (stop then delete then re-create), the resume target is gone — the resolution is recorded but no execution resumes.
  • Re-delivery of a ScaiQueue completion finds the row already resolved and exits cleanly — this is by design, not a bug.

Notifier didn't fire on checkpoint create#

  • The wrapper logs scaicore_checkpoint_notify_failed if the notifier threw. Check the request id.
  • If the assignee resolves to a non-email value (group / role with no email mapping), no email is sent — the row still exists, the admin UI's checkpoint queue is the fallback discovery path.
  • A notifier is only invoked if one is configured at module init. In dev / test, the wrapper runs notifier-less and notification_sent stays false.

Avatar upload returns INVALID_FILE_TYPE#

The wrapper validates two things: the declared content-type header (must be one of image/png, image/jpeg, image/gif, image/webp, image/svg+xml) and the magic bytes of the uploaded file (except SVG, which is text-based). A PNG renamed to .jpg with content-type: image/jpeg will pass content-type but fail magic-byte validation.

Avatar upload returns STORAGE_ERROR#

S3 (Garage in production) returned non-2xx. Most often a misconfigured S3_BUCKET / S3_ENDPOINT_URL env in the ScaiGrid deployment, or transient unavailability of the storage cluster. The avatar is not stored; retry once.

Publish errors with "Core is already published as a model"#

Run /unpublish first if you want a fresh slug or different group membership. To update the existing published row (name, description, persona, avatar), the publishing service exposes sync() internally — there is no HTTP endpoint for this today; we trigger sync from edit paths.

Published model isn't visible to clients#

Most likely cause: you didn't pass group_ids (or the caller wasn't permitted to add to the groups you specified, so they were silently dropped). The model exists in the catalogue but isn't in any group; group-restricted clients won't see it. Add it manually:

bash
1
2
3
curl -X POST "$SCAIGRID_HOST/v1/models/groups/$GROUP_ID/members" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -d '{ "slugs": ["scaicore/your-tenant/your-core"] }'

Delegation expired mid-execution#

DELEGATION_EXPIRED (403) is raised on the next start() (or anywhere identity is re-resolved). The Core stays in its current state — you need to either extend the delegation (PUT /cores/{id}/delegation with a new expires_at) or revoke and re-issue.

Endpoints that return 501 NOT_IMPLEMENTED#

The wrapper deliberately ships several routes as stubs so frontends can pre-wire them. The ones that currently return 501:

  • GET /cores/{id}/logs/stream
  • POST /cores/{id}/debug/breakpoint
  • POST /cores/{id}/debug/step
  • POST /cores/{id}/debug/inspect
  • * /cores/{id}/api/invoke/{path}

And these return empty placeholders (200 OK with empty data):

  • GET /cores/{id}/logs{ "logs": [] }
  • GET /cores/{id}/events/history{ "events": [] }
  • GET /plugins/available{ "items": [] }

If you're relying on any of these, check the changelog before deploying.

Updated 2026-05-18 15:01:29 View source (.md) rev 11