---
title: Service marked unreachable
path: troubleshooting/service-marked-unreachable
status: published
---

# Service marked unreachable

A registered ScaiLabs service shows `health_status='unreachable'` in `/admin/registry`, and the operations team sees alerts. Use this page to diagnose why ScaiControl can't see its heartbeats.

## How "unreachable" is determined

The `registry_heartbeat_monitor` cron runs every `REGISTRY_HEARTBEAT_MONITOR_INTERVAL` seconds (default 60). For each registered service:

- It looks at `last_heartbeat_at` from `service_registry`.
- Grace period = service's own `heartbeat_interval_seconds × REGISTRY_HEARTBEAT_GRACE_MULTIPLIER` (default ×3).
- If `now - last_heartbeat_at > grace`, the `consecutive_misses` counter increments.

Thresholds (configurable via env):

| Consecutive misses | Health status |
|---|---|
| 0 | `healthy` |
| `REGISTRY_HEARTBEAT_DEGRADED_THRESHOLD` (default 3) | `degraded` |
| `REGISTRY_HEARTBEAT_UNREACHABLE_THRESHOLD` (default 10) | `unreachable` |

So with defaults: a service heartbeating every 30 seconds, with `grace = 90 sec`, hits `unreachable` after ~10 missed grace windows = ~15 minutes of silence.

## Step 1 — Is the service actually running?

Standard process check — `ps`, `systemctl status`, `kubectl get pods`, whatever your runtime exposes. If the service is down, that's the answer; start it.

## Step 2 — Is it heartbeating?

Look at the most recent heartbeat in ScaiControl:

```sql
SELECT id, slug, name, last_heartbeat_at, consecutive_misses, health_status,
       heartbeat_interval_seconds
FROM service_registry
WHERE slug = '<service-slug>';
```

If `last_heartbeat_at` is very recent but `health_status` is still `unreachable`, the monitor hasn't run yet — wait one cycle.

If `last_heartbeat_at` is stale, the heartbeats are not arriving. Move to Step 3.

## Step 3 — Can the service reach ScaiControl?

Heartbeats are `POST /api/v1/registry/heartbeat` with a service token. Test from the service host:

```bash
curl -i -X POST "$SCAICONTROL_URL/api/v1/registry/heartbeat" \
     -H "Authorization: Bearer $SERVICE_TOKEN" \
     -H "Content-Type: application/json" \
     -d '{"status":"healthy"}'
```

Expected: `200 {"ok": true}`.

Possible failures:

| HTTP / network | Meaning |
|---|---|
| Network timeout / connection refused | Service can't reach ScaiControl's URL. Check DNS, firewall, ingress rules |
| `401` | Service token invalid, expired, or wrong issuer. The service might be reading stale credentials |
| `403` | Token valid but lacks `registry:manage` scope. Re-issue via ScaiKey |
| `404` | Wrong URL (e.g. missing `/api/v1`) |
| `5xx` | ScaiControl problem — check its logs |

## Step 4 — Is the heartbeat being recorded?

If the service reports successful heartbeats but ScaiControl still says `last_heartbeat_at` is stale, the request is reaching a different ScaiControl instance (load balancer fronting multiple deployments with separate databases) or a stale cache. Verify the service is hitting the actual `SCAICONTROL_URL` it should.

Backend log line:

```
INFO  registry.heartbeat slug=<slug> status=healthy
```

Grep for it; absence at the expected time means the request didn't land.

## Step 5 — Is the monitor cron running?

```bash
ps aux | grep -E 'arq|heartbeat_monitor'
```

The cron lives inside the arq worker. If the worker is down, `consecutive_misses` won't tick down even after heartbeats resume — but `last_heartbeat_at` WILL update from the live POSTs, so health_status will look stuck at `unreachable` until the cron runs next.

Restart the worker; one cycle resets the counter.

## Step 6 — Service is up but ScaiControl is misconfigured

Mismatch in the registered URL. ScaiControl's `service_registry.base_url` is what it'd USE to reach back, not where heartbeats come from — but if you've changed the service's deployment URL without re-registering, downstream provisioning calls will fail (the service marked itself unreachable through ScaiControl's reverse health checks, not via missed heartbeats).

```sql
SELECT slug, base_url, callback_url FROM service_registry WHERE slug = '<slug>';
```

Update via `PATCH /api/v1/admin/registry/{id}` if wrong.

## Step 7 — Force-reset the status

Once the underlying issue is fixed and heartbeats are flowing, the service moves back to `healthy` automatically on the next successful heartbeat (the heartbeat handler clears `consecutive_misses` and sets `health_status='healthy'` in the same transaction). No manual action required.

If you need to nudge it for testing:

```sql
UPDATE service_registry
SET health_status = 'healthy', consecutive_misses = 0, last_heartbeat_at = NOW()
WHERE slug = '<slug>';
```

This is purely cosmetic — if the underlying issue persists, the next monitor cycle will revert the status.

## "Approved" vs "healthy" — different concepts

Don't conflate them:

- `registration_status` ∈ {`pending`, `approved`, `rejected`} — administrative gate; only `approved` services can heartbeat or be provisioned to.
- `health_status` ∈ {`healthy`, `degraded`, `unreachable`} — operational signal, derived from heartbeats.

A service can be `approved` + `unreachable` (just down right now). It cannot be `pending` + `healthy` — a `pending` service has no token to heartbeat with.

## See also

- [Reference: configuration](../reference/configuration) — heartbeat env vars
- [Reference: state-machines](../reference/state-machines) — registry health transitions
- [Concepts: architecture](../concepts/architecture) — service registry's role
