Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Checkpoints

A checkpoint is a paused execution inside a Core, persisted to MariaDB while it waits for a human (or another system) to make a decision. The IR-level details of when a checkpoint is created live in the language — see /docs/scaicore. This page is about what ScaiGrid does with the checkpoint once it exists.

The row#

Every checkpoint is one row in mod_scaicore_checkpoints. The fields you care about as a wrapper user:

Field Purpose
id UUID; also the correlation id used by the ScaiQueue HITL bridge.
core_id, tenant_id Owning Core and tenant.
execution_id, flow_name, block_index Where in the program execution suspended.
instance_key For entity-mode Cores, the entity that suspended.
checkpoint_type Kind of decision — approval, choice, freeform, etc. (set by the program).
prompt The question shown to the assignee.
options Optional JSON list of canonical answer choices.
assignee_raw The raw assignee string from the program — e.g. user:alice@acme, group:approvers, role:tenant_admin.
assignee_type Parsed type: user, group, role, delegated_user, or unrouted.
assignee_resolved Parsed structure with the type + value.
context Arbitrary JSON the program attached. Includes resolved_skills when bound.
status pending, resolved, expired, cancelled.
priority low, normal, high, critical.
expires_at, expiry_action, escalation_target Lifecycle controls.
notification_sent, reminder_count, reminder_interval_m Notification bookkeeping.
resolution, resolved_by, resolved_at Populated when a human acts.

The state blob (the captured execution state the engine needs to resume) is stored in S3, keyed by state_s3_key. The row is the index; S3 holds the bytes.

Assignment#

The program writes a raw assignee string. The wrapper parses the <type>:<value> prefix:

  • user:<email-or-id> — one human.
  • group:<group-id> — anyone in the group can act.
  • role:<role> — anyone holding the role can act.
  • (no prefix) — treated as user: with the raw value.

If the Core is delegated to a specific user, ScaiCore programs can also produce delegated_user checkpoints — they route to whatever human the delegation is currently scoped to.

When no assignee can be resolved (typical for offline-debugging Cores), the type is unrouted. Unrouted checkpoints accumulate in the admin UI's checkpoint queue for a tenant admin to claim manually.

Notifications#

If a notifier is configured (email transport, Slack, etc.), the wrapper sends a notification at create time and marks notification_sent = true. The 15-minute send_checkpoint_reminders cron re-pings still-pending checkpoints whose reminder_interval_m is set, incrementing reminder_count each time.

Emails are sent to addresses extracted from the resolved assignee — addresses that look like emails are passed straight through; group / role resolution to email lists is the notifier's responsibility.

Expiry#

The 5-minute expire_checkpoints cron picks up rows where expires_at <= now() and status = pending. The expiry_action decides what happens:

  • cancel — the row is marked cancelled. The program never resumes from this checkpoint.
  • default_option — the row is marked resolved with decision: "default" and auto_expired: true. The runtime resumes as if a human had picked the default.
  • escalate — the row's assignee_raw is replaced with escalation_target, the notifier is asked to escalate, and the row stays pending for the new assignee.

Resolution#

A human (or any caller with scaicore:checkpoint_resolve) posts to /checkpoints/{id}/resolve:

json
1
2
3
4
5
{
  "decision": "approve",
  "response_data": { "amount_approved": 500 },
  "comment": "Within limits."
}

The row's status flips to resolved, the resolution JSON captures the decision + response + comment, and resolved_by / resolved_at get set. The runtime is responsible for picking up the resolution and resuming the suspended execution.

A checkpoint can also be cancelled (POST /checkpoints/{id}/cancel) without a decision, or reassigned (POST /checkpoints/{id}/reassign) to a different assignee — which re-runs the assignment parser and re-fires the notifier.

Frozen skill versions#

If the Core has bound ScaiSkills, the resolved skill set at checkpoint-creation time is frozen into the row's context.resolved_skills. On resume, the runtime reads those pinned versions back instead of re-resolving — so the resumed execution uses the exact same skill versions it suspended with, even if a new version has been published or yanked since.

Yanked pinned versions are allowed through with a warning (ScaiSkills ERRATA-v0.2 option 2). Missing pinned skill rows are also tolerated; the warning is logged but the resume proceeds.

ScaiQueue HITL bridge#

Programs can publish a checkpoint as a hitl_request message into ScaiQueue (with hitl_message_id recorded on the row). When the ScaiQueue message is completed, an in-process scaiqueue.message.completed event fires. The wrapper's handler — registered by the module — looks the checkpoint up by correlation_id == checkpoint_id and resolves it internally.

The handler is idempotent: a re-delivered completion event finds the checkpoint already resolved and returns cleanly. If the checkpoint was never ours (different correlation_id), the handler exits without touching anything.

Audit history#

GET /checkpoints/{id}/history returns a simplified event list — created, notification sent, resolved. It is not a full append-only audit log; for that, use ScaiGrid's audit-events pipeline filtered by module=scaicore.

Updated 2026-05-18 15:01:29 View source (.md) rev 11