Troubleshooting
A short list of things that go wrong with the ScaiGrid wrapper and how to fix them. For language-level problems (compiler errors, IR validation, DSL semantics), see /docs/scaicore — those are not in this module.
parse-bundle rejects my file#
- "File is neither a valid .scaicore-ir bundle nor JSON" — binary bundles must start with the
SCIRmagic bytes; JSON dumps must be valid JSON at the top level. If you exported the bundle via the standalone runtime, double-check you copied the binary file (not its hex dump). - "Bundle exceeds 10 MB limit" — the wrapper caps at 10 MB. Trim embedded assets (avatar, vendored files) and re-export.
- "Failed to parse bundle: ..." — IR-level deserialization failure. Re-export from a current compiler; the wrapper uses ScaiCore's
IRSerializer.deserialize.
Core won't start — INVALID_SOURCE_IR#
The Core's persisted IR didn't pass validate_source on start. Usually one of:
- The IR was edited by hand into an inconsistent shape.
- A migration changed the IR schema and your bundle is from an old compiler.
Re-export the bundle, parse it via /parse-bundle, then PUT /cores/{id} with the new source.
Core stuck in error state#
POST /cores/{id}/start transitions to error if security validation, environment resolution, or IR validation throws. Check the request id from the error response, then grep the ScaiGrid logs for core.env_decrypt_failed, core.skills_resolve_error, or any explicit raised exception.
The legal transition out of error is back to starting (i.e. retry by calling /start again after fixing the root cause). You cannot delete an error Core without resolving the row first.
Start succeeds but the Core never does anything#
Check the runtime mode. In event_driven mode the Core idles until you send it events via /cores/{id}/events. In interactive mode the Core needs an incoming chat request through /v1/inference/chat (which requires publish). In model mode same as interactive. In api mode it waits for /cores/{id}/api/invoke/{path} — which is currently NOT_IMPLEMENTED, so api Cores can't be triggered externally yet.
Checkpoints never appear#
- The IR may not contain any checkpoint blocks. Run the bundle locally first; if no checkpoints fire there either, the program never pauses.
checkpoint_mode: routeddrops checkpoints withassignee_type = unrouted. Switch toautowhile you debug, or set an explicit assignee in the program.- The 5-minute
expire_checkpointscron may have auto-cancelled rows with staleexpires_at. Check/checkpoints/allforexpired/cancelledrows.
Checkpoint resolution silently doesn't resume the program#
- Confirm the row's status is now
resolvedandresolutionis non-null. - The wrapper writes the row; the runtime is responsible for picking up the resolution and resuming. If the engine has been unregistered (
stopthendeletethen re-create), the resume target is gone — the resolution is recorded but no execution resumes. - Re-delivery of a ScaiQueue completion finds the row already resolved and exits cleanly — this is by design, not a bug.
Notifier didn't fire on checkpoint create#
- The wrapper logs
scaicore_checkpoint_notify_failedif the notifier threw. Check the request id. - If the assignee resolves to a non-email value (group / role with no email mapping), no email is sent — the row still exists, the admin UI's checkpoint queue is the fallback discovery path.
- A notifier is only invoked if one is configured at module init. In dev / test, the wrapper runs notifier-less and
notification_sentstaysfalse.
Avatar upload returns INVALID_FILE_TYPE#
The wrapper validates two things: the declared content-type header (must be one of image/png, image/jpeg, image/gif, image/webp, image/svg+xml) and the magic bytes of the uploaded file (except SVG, which is text-based). A PNG renamed to .jpg with content-type: image/jpeg will pass content-type but fail magic-byte validation.
Avatar upload returns STORAGE_ERROR#
S3 (Garage in production) returned non-2xx. Most often a misconfigured S3_BUCKET / S3_ENDPOINT_URL env in the ScaiGrid deployment, or transient unavailability of the storage cluster. The avatar is not stored; retry once.
Publish errors with "Core is already published as a model"#
Run /unpublish first if you want a fresh slug or different group membership. To update the existing published row (name, description, persona, avatar), the publishing service exposes sync() internally — there is no HTTP endpoint for this today; we trigger sync from edit paths.
Published model isn't visible to clients#
Most likely cause: you didn't pass group_ids (or the caller wasn't permitted to add to the groups you specified, so they were silently dropped). The model exists in the catalogue but isn't in any group; group-restricted clients won't see it. Add it manually:
1 2 3 | |
Delegation expired mid-execution#
DELEGATION_EXPIRED (403) is raised on the next start() (or anywhere identity is re-resolved). The Core stays in its current state — you need to either extend the delegation (PUT /cores/{id}/delegation with a new expires_at) or revoke and re-issue.
Endpoints that return 501 NOT_IMPLEMENTED#
The wrapper deliberately ships several routes as stubs so frontends can pre-wire them. The ones that currently return 501:
GET /cores/{id}/logs/streamPOST /cores/{id}/debug/breakpointPOST /cores/{id}/debug/stepPOST /cores/{id}/debug/inspect* /cores/{id}/api/invoke/{path}
And these return empty placeholders (200 OK with empty data):
GET /cores/{id}/logs—{ "logs": [] }GET /cores/{id}/events/history—{ "events": [] }GET /plugins/available—{ "items": [] }
If you're relying on any of these, check the changelog before deploying.