---
summary: "How publish, claim, complete, and reclaim fit together \u2014 MariaDB rows,\
  \ Redis sorted sets, system agents."
title: Architecture
path: concepts/architecture
status: published
---

ScaiQueue is a thin product layer over MariaDB and Redis. Messages are durable rows; the per-queue order and the claim locks live in Redis; system agents enforce visibility timeouts, retries, archival, and dead-lettering on a cron schedule.

## Components

A producer publishes over HTTP. ScaiGrid's `MessageService` writes the row, updates Redis, and (optionally) invokes the routing engine. A consumer claims via HTTP; on completion ScaiGrid emits an event on its internal event bus. System agents do the housekeeping out-of-band.

```mermaid
flowchart LR
    P[Producer]
    C[Consumer]
    subgraph SG[ScaiGrid FastAPI process]
        MS[MessageService<br/>- row in MariaDB<br/>- ZADD pending ZSET<br/>- RoutingEngine.eval]
        CL[POST claim<br/>- ZPOPMIN pending<br/>- SET NX EX claim lock<br/>- UPDATE state=claimed]
        CO[POST complete / fail /<br/>release / extend]
        SA[System agents arq cron<br/>- visibility_timeout<br/>- expiry_enforcer<br/>- priority_aging<br/>- dead_letter_monitor<br/>- message_archiver]
    end
    P -- POST publish --> MS
    C -- POST claim --> CL
    CL --> C
    C -- complete/fail/release/extend --> CO
```

There is no separate ScaiQueue daemon. The HTTP endpoints, the routing engine, and the cron-driven system agents all run inside ScaiGrid's existing FastAPI and arq processes.

## Lifecycle of one message

1. **Publish.** `MessageService.publish` validates scope and queue state (`paused`, `archived`, or `draining` reject), writes the message row with `state=pending`, computes a sort score based on the queue's ordering mode, and `ZADD`s the id into the per-queue pending sorted set. The routing engine then evaluates `message_published`-trigger rules; if one matches it may route, transform, escalate, or archive.
2. **Claim.** A consumer calls claim. The service `ZPOPMIN`s the lowest-score id, takes a Redis `SET NX EX` lock keyed by the message id with `visibility_timeout_s` TTL, then updates the row to `state=claimed`, sets `claimed_by_*`, increments `attempts`. The queue's `depth_pending` decrements and `depth_claimed` increments.
3. **Complete or fail.** Complete moves the row to `completed`, releases the Redis lock, decrements `depth_claimed`, and emits a `scaiqueue.message.completed` event on ScaiGrid's event bus with the original `correlation_id`. Fail records `failure_reason`; if `attempts < max_retries` the message returns to pending with a backoff, otherwise it is moved to the scope's `_dead_letter` queue.
4. **Reclaim on crash.** If the consumer never calls complete or fail, the claim lock expires. The `visibility_timeout_enforcer` cron job (runs every second) flips the row back to `pending` and re-adds it to the pending sorted set.

## Ordering modes

The score in the Redis sorted set depends on the queue's `ordering`:

- `fifo` — `created_at` as epoch seconds. Lowest is oldest, so `ZPOPMIN` returns the oldest first.
- `priority` — `-(tier*1000 + score)` then `created_at`. Higher tier-plus-score wins; ties broken by age.
- `deadline` — `deadline` as epoch seconds. Earliest deadline wins.

The `consumer_mode` field is independent: `competing` (default — one consumer claims each message), `broadcast` (every subscriber sees a copy), or other modes that may exist on system queues.

## State

- **Scopes, queues, messages, routing rules, schemas, HITL patterns, ACLs, API keys, audit log, system-agent registry** — all in MariaDB tables prefixed `mod_scaiqueue_`.
- **Pending order, claim locks, visibility timeouts, subscription dedup sets** — Redis keys under `sq:{tenant}:{scope}:...`.
- **Archived messages** — moved from `mod_scaiqueue_messages` to `mod_scaiqueue_message_archive` by the `message_archiver` agent for terminal messages older than one hour.

If Redis is unavailable, `MessageService` falls back to a DB-only claim path that scans pending rows in `created_at` order. Slower; still correct.

## System agents

These run inside ScaiGrid's arq workers on a cron schedule. You don't start them; they appear in `GET /system-agents`.

| Agent | Schedule | Role |
|---|---|---|
| `visibility_timeout_enforcer` | every second | Reclaims expired claims, returns messages to pending. |
| `expiry_enforcer` | every minute | Marks TTL- and deadline-expired pending messages as terminal. |
| `message_archiver` | every 5 minutes | Moves terminal messages older than 1 hour to the archive table. |
| `dead_letter_monitor` | every minute | Emits structured warnings when `_dead_letter` queues grow. |
| `priority_aging_worker` | every 5 minutes | Bumps `priority_score` on long-pending messages so low-priority work doesn't starve. |

## How it differs from rolling your own

Building this on raw Redis Streams or a hosted broker gives you a topic and a delivery guarantee. ScaiQueue adds the typed payload, the routing engine, the HITL spec, the dead-letter queue, the audit log, and the system agents — all wired into ScaiGrid's auth, accounting, and admin UI. If you only need fire-and-forget pub/sub, you don't need ScaiQueue. If you need durable, retryable, observable work distribution with optional human steps, you do.
