---
summary: Common symptoms and what they usually mean.
title: Troubleshooting
path: troubleshooting
status: published
---

A short list of things that go wrong with ScaiQueue and how to fix them. If none of these match, check the request id in the response envelope and the audit log at `GET /scopes/{scope_id}/audit?correlation_id=...`.

## Publish returns `SCAIQUEUE_SCOPE_PAUSED` or `SCAIQUEUE_QUEUE_PAUSED`

Someone paused the scope or queue (or it's `archived` / `draining`). Check state via `GET /scopes/{scope_id}` and `GET /scopes/{scope_id}/queues/{queue_id}`. Resume with `POST /scopes/{scope_id}/resume` or `POST /scopes/{scope_id}/queues/{queue_id}/resume`. Archived scopes must be unarchived first.

## Publish returns `SCAIQUEUE_QUEUE_FULL`

The queue's `max_depth` is set and reached. Options: drain the queue (consume more aggressively), raise `max_depth`, or change `overflow_policy` away from `reject`.

## Publish returns `SCAIQUEUE_IDEMPOTENCY_CONFLICT`

You sent the same `idempotency_key` twice with a different body. Either re-use the same body (which returns the original message id) or use a fresh key.

## Claim returns an empty list when you know there's work

A few causes, in priority order:

- **Queue is paused** — `GET /queues/{queue_id}` and check `state`.
- **Scope is paused or archived** — checking only the queue isn't enough.
- **Message is not yet visible** — `delay_until` or `visible_at` set in the future.
- **Redis is unhealthy** — claim falls back to the DB path, which is slower; check ScaiGrid logs for Redis errors.
- **Another consumer claimed it first** — claims are atomic; in a competing-consumer queue, only one wins. Increase `batch_size` if you have many consumers and small messages.

## Messages keep being reclaimed by other consumers

Your consumer isn't calling `complete` or `fail` (or `extend`) before `visibility_timeout_s` expires. The `visibility_timeout_enforcer` runs every second and flips abandoned claims back to pending. Either:

- Raise `visibility_timeout_s` on publish (or on the claim call) to cover the long tail of your processing time, or
- Call `POST /scopes/{scope_id}/messages/{msg_id}/extend` periodically from long-running workers.

## Messages land in `_dead_letter`

A message landed in dead-letter because it was failed `max_retries` times (default 3). Find it via `GET /scopes/{scope_id}/queues/<dead-letter-queue-id>/messages` and inspect `failure_reason`. After fixing the root cause, you can republish the body to the original queue manually.

## Routing rule never fires

In priority order:

- **Rule is disabled.** `GET /scopes/{scope_id}/routing-rules/{rule_id}` — check `enabled`.
- **Trigger mismatch.** Make sure the rule's trigger event matches what you expect (default rules fire on `message_published`).
- **Conditions don't match.** Run `POST /scopes/{scope_id}/routing-rules/test` with a realistic test message and see which rules match.
- **A higher-priority rule wins first.** Rules are evaluated in priority-ascending order and first-match wins. Lower the rule's `priority` to make it more selective, or change conditions on the rules ahead of it.

## Routing loop / "circuit_breaker" audit entries

The routing engine refuses to apply more than 5 hops per message and writes a `routing.circuit_breaker` audit entry. Inspect your rule graph for cycles — typically a rule routes back into a queue that's a source for another rule.

## Stream never completes

The stream is stuck `open`. Causes:

- The producer never published a final chunk (`stream_final=1`).
- `expected_chunks` was set but not all sequences arrived.
- `timeout_seconds` (default 300) elapsed and the stream is technically expired but assembly still works on what arrived.

Either cancel via `POST /scopes/{scope_id}/streams/{stream_id}/cancel` or fetch what you have via `GET /scopes/{scope_id}/streams/{stream_id}/assembled`.

## API key rejected after rotation

Old keys remain valid for `grace_period_seconds` after rotation (default 300). After that, they are revoked. If a consumer fails right after a rotation, check whether it picked up the new key.

## "Unauthorized" / `PERMISSION_DENIED` from a tenant-admin

Every ScaiQueue endpoint requires the caller to have a `tenant_id` on their token. A `super_admin` without a tenant context (cross-tenant operator) is forbidden. Either re-issue the token scoped to a tenant, or operate via that tenant's admin user.

## System agents say `status: idle` but timeouts aren't being enforced

`idle` is the resting state between runs. Check `last_run_at` and `total_runs` — if they're not advancing, the arq worker isn't running. Check ScaiGrid's worker mode is up (`SCAIGRID_MODE=worker`).