--- summary: "How publish, claim, complete, and reclaim fit together \u2014 MariaDB rows,\ \ Redis sorted sets, system agents." title: Architecture path: concepts/architecture status: published --- ScaiQueue is a thin product layer over MariaDB and Redis. Messages are durable rows; the per-queue order and the claim locks live in Redis; system agents enforce visibility timeouts, retries, archival, and dead-lettering on a cron schedule. ## Components A producer publishes over HTTP. ScaiGrid's `MessageService` writes the row, updates Redis, and (optionally) invokes the routing engine. A consumer claims via HTTP; on completion ScaiGrid emits an event on its internal event bus. System agents do the housekeeping out-of-band. ```mermaid flowchart LR P[Producer] C[Consumer] subgraph SG[ScaiGrid FastAPI process] MS[MessageService
- row in MariaDB
- ZADD pending ZSET
- RoutingEngine.eval] CL[POST claim
- ZPOPMIN pending
- SET NX EX claim lock
- UPDATE state=claimed] CO[POST complete / fail /
release / extend] SA[System agents arq cron
- visibility_timeout
- expiry_enforcer
- priority_aging
- dead_letter_monitor
- message_archiver] end P -- POST publish --> MS C -- POST claim --> CL CL --> C C -- complete/fail/release/extend --> CO ``` There is no separate ScaiQueue daemon. The HTTP endpoints, the routing engine, and the cron-driven system agents all run inside ScaiGrid's existing FastAPI and arq processes. ## Lifecycle of one message 1. **Publish.** `MessageService.publish` validates scope and queue state (`paused`, `archived`, or `draining` reject), writes the message row with `state=pending`, computes a sort score based on the queue's ordering mode, and `ZADD`s the id into the per-queue pending sorted set. The routing engine then evaluates `message_published`-trigger rules; if one matches it may route, transform, escalate, or archive. 2. **Claim.** A consumer calls claim. The service `ZPOPMIN`s the lowest-score id, takes a Redis `SET NX EX` lock keyed by the message id with `visibility_timeout_s` TTL, then updates the row to `state=claimed`, sets `claimed_by_*`, increments `attempts`. The queue's `depth_pending` decrements and `depth_claimed` increments. 3. **Complete or fail.** Complete moves the row to `completed`, releases the Redis lock, decrements `depth_claimed`, and emits a `scaiqueue.message.completed` event on ScaiGrid's event bus with the original `correlation_id`. Fail records `failure_reason`; if `attempts < max_retries` the message returns to pending with a backoff, otherwise it is moved to the scope's `_dead_letter` queue. 4. **Reclaim on crash.** If the consumer never calls complete or fail, the claim lock expires. The `visibility_timeout_enforcer` cron job (runs every second) flips the row back to `pending` and re-adds it to the pending sorted set. ## Ordering modes The score in the Redis sorted set depends on the queue's `ordering`: - `fifo` — `created_at` as epoch seconds. Lowest is oldest, so `ZPOPMIN` returns the oldest first. - `priority` — `-(tier*1000 + score)` then `created_at`. Higher tier-plus-score wins; ties broken by age. - `deadline` — `deadline` as epoch seconds. Earliest deadline wins. The `consumer_mode` field is independent: `competing` (default — one consumer claims each message), `broadcast` (every subscriber sees a copy), or other modes that may exist on system queues. ## State - **Scopes, queues, messages, routing rules, schemas, HITL patterns, ACLs, API keys, audit log, system-agent registry** — all in MariaDB tables prefixed `mod_scaiqueue_`. - **Pending order, claim locks, visibility timeouts, subscription dedup sets** — Redis keys under `sq:{tenant}:{scope}:...`. - **Archived messages** — moved from `mod_scaiqueue_messages` to `mod_scaiqueue_message_archive` by the `message_archiver` agent for terminal messages older than one hour. If Redis is unavailable, `MessageService` falls back to a DB-only claim path that scans pending rows in `created_at` order. Slower; still correct. ## System agents These run inside ScaiGrid's arq workers on a cron schedule. You don't start them; they appear in `GET /system-agents`. | Agent | Schedule | Role | |---|---|---| | `visibility_timeout_enforcer` | every second | Reclaims expired claims, returns messages to pending. | | `expiry_enforcer` | every minute | Marks TTL- and deadline-expired pending messages as terminal. | | `message_archiver` | every 5 minutes | Moves terminal messages older than 1 hour to the archive table. | | `dead_letter_monitor` | every minute | Emits structured warnings when `_dead_letter` queues grow. | | `priority_aging_worker` | every 5 minutes | Bumps `priority_score` on long-pending messages so low-priority work doesn't starve. | ## How it differs from rolling your own Building this on raw Redis Streams or a hosted broker gives you a topic and a delivery guarantee. ScaiQueue adds the typed payload, the routing engine, the HITL spec, the dead-letter queue, the audit log, and the system agents — all wired into ScaiGrid's auth, accounting, and admin UI. If you only need fire-and-forget pub/sub, you don't need ScaiQueue. If you need durable, retryable, observable work distribution with optional human steps, you do.