---
summary: Common symptoms in the ScaiCore wrapper and what they usually mean.
title: Troubleshooting
path: modules/scaicore/troubleshooting
status: published
---

A short list of things that go wrong with the ScaiGrid wrapper and how to fix them. For *language*-level problems (compiler errors, IR validation, DSL semantics), see [/docs/scaicore](https://www.scailabs.ai/docs/scaicore) — those are not in this module.

## `parse-bundle` rejects my file

- **"File is neither a valid .scaicore-ir bundle nor JSON"** — binary bundles must start with the `SCIR` magic bytes; JSON dumps must be valid JSON at the top level. If you exported the bundle via the standalone runtime, double-check you copied the binary file (not its hex dump).
- **"Bundle exceeds 10 MB limit"** — the wrapper caps at 10 MB. Trim embedded assets (avatar, vendored files) and re-export.
- **"Failed to parse bundle: ..."** — IR-level deserialization failure. Re-export from a current compiler; the wrapper uses ScaiCore's `IRSerializer.deserialize`.

## Core won't start — `INVALID_SOURCE_IR`

The Core's persisted IR didn't pass `validate_source` on start. Usually one of:

- The IR was edited by hand into an inconsistent shape.
- A migration changed the IR schema and your bundle is from an old compiler.

Re-export the bundle, parse it via `/parse-bundle`, then `PUT /cores/{id}` with the new `source`.

## Core stuck in `error` state

`POST /cores/{id}/start` transitions to `error` if security validation, environment resolution, or IR validation throws. Check the request id from the error response, then grep the ScaiGrid logs for `core.env_decrypt_failed`, `core.skills_resolve_error`, or any explicit raised exception.

The legal transition out of `error` is back to `starting` (i.e. retry by calling `/start` again after fixing the root cause). You cannot delete an `error` Core without resolving the row first.

## Start succeeds but the Core never does anything

Check the runtime mode. In `event_driven` mode the Core idles until you send it events via `/cores/{id}/events`. In `interactive` mode the Core needs an incoming chat request through `/v1/inference/chat` (which requires publish). In `model` mode same as interactive. In `api` mode it waits for `/cores/{id}/api/invoke/{path}` — which is currently `NOT_IMPLEMENTED`, so `api` Cores can't be triggered externally yet.

## Checkpoints never appear

- The IR may not contain any checkpoint blocks. Run the bundle locally first; if no checkpoints fire there either, the program never pauses.
- `checkpoint_mode: routed` drops checkpoints with `assignee_type = unrouted`. Switch to `auto` while you debug, or set an explicit assignee in the program.
- The 5-minute `expire_checkpoints` cron may have auto-cancelled rows with stale `expires_at`. Check `/checkpoints/all` for `expired` / `cancelled` rows.

## Checkpoint resolution silently doesn't resume the program

- Confirm the row's status is now `resolved` and `resolution` is non-null.
- The wrapper writes the row; the *runtime* is responsible for picking up the resolution and resuming. If the engine has been unregistered (`stop` then `delete` then re-create), the resume target is gone — the resolution is recorded but no execution resumes.
- Re-delivery of a ScaiQueue completion finds the row already resolved and exits cleanly — this is by design, not a bug.

## Notifier didn't fire on checkpoint create

- The wrapper logs `scaicore_checkpoint_notify_failed` if the notifier threw. Check the request id.
- If the assignee resolves to a non-email value (group / role with no email mapping), no email is sent — the row still exists, the admin UI's checkpoint queue is the fallback discovery path.
- A notifier is only invoked if one is configured at module init. In dev / test, the wrapper runs notifier-less and `notification_sent` stays `false`.

## Avatar upload returns `INVALID_FILE_TYPE`

The wrapper validates two things: the declared `content-type` header (must be one of `image/png`, `image/jpeg`, `image/gif`, `image/webp`, `image/svg+xml`) **and** the magic bytes of the uploaded file (except SVG, which is text-based). A PNG renamed to `.jpg` with `content-type: image/jpeg` will pass content-type but fail magic-byte validation.

## Avatar upload returns `STORAGE_ERROR`

S3 (Garage in production) returned non-2xx. Most often a misconfigured `S3_BUCKET` / `S3_ENDPOINT_URL` env in the ScaiGrid deployment, or transient unavailability of the storage cluster. The avatar is not stored; retry once.

## Publish errors with "Core is already published as a model"

Run `/unpublish` first if you want a fresh slug or different group membership. To update the existing published row (name, description, persona, avatar), the publishing service exposes `sync()` internally — there is no HTTP endpoint for this today; we trigger sync from edit paths.

## Published model isn't visible to clients

Most likely cause: you didn't pass `group_ids` (or the caller wasn't permitted to add to the groups you specified, so they were silently dropped). The model exists in the catalogue but isn't in any group; group-restricted clients won't see it. Add it manually:

```bash
curl -X POST "$SCAIGRID_HOST/v1/models/groups/$GROUP_ID/members" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -d '{ "slugs": ["scaicore/your-tenant/your-core"] }'
```

## Delegation expired mid-execution

`DELEGATION_EXPIRED` (403) is raised on the next `start()` (or anywhere identity is re-resolved). The Core stays in its current state — you need to either extend the delegation (`PUT /cores/{id}/delegation` with a new `expires_at`) or revoke and re-issue.

## Endpoints that return `501 NOT_IMPLEMENTED`

The wrapper deliberately ships several routes as stubs so frontends can pre-wire them. The ones that currently return `501`:

- `GET /cores/{id}/logs/stream`
- `POST /cores/{id}/debug/breakpoint`
- `POST /cores/{id}/debug/step`
- `POST /cores/{id}/debug/inspect`
- `* /cores/{id}/api/invoke/{path}`

And these return empty placeholders (200 OK with empty data):

- `GET /cores/{id}/logs` — `{ "logs": [] }`
- `GET /cores/{id}/events/history` — `{ "events": [] }`
- `GET /plugins/available` — `{ "items": [] }`

If you're relying on any of these, check the changelog before deploying.
