---
audience: developer
summary: Your client connects, then drops; or messages stop appearing live.
title: WebSocket keeps disconnecting
path: troubleshooting/websocket-disconnects
status: published
---

# WebSocket keeps disconnecting

## Symptom: Connects, drops every few seconds

Causes:

- **Proxy or load balancer has too low an idle timeout.** ScaiWave
  pings every 30s; if a middlebox closes idle connections at 20s,
  you get drops. Increase the LB / nginx `proxy_read_timeout`
  past 60s (recommend 75–90s).
- **Sticky sessions not configured.** Some LBs round-robin
  upgrade requests; the WS gets routed to a pod that doesn't have
  the session. Configure sticky.
- **Network conditions** (mobile network, VPN). Less fixable; the
  client should auto-reconnect via the `/v1/sync` long-poll.

## Symptom: Drops without close frame

Connection just goes silent — no close frame, no error. Usually
a misconfigured intermediary. Test:

```bash
websocat -v "wss://your-host/v1/stream?token=$TOKEN"
```

Listen for 60+ seconds. If you get nothing → silent drop somewhere
in the path.

## Symptom: Close frame with code 4001

`SW_AUTH_INVALID_TOKEN`. Token expired or invalid. The web client
should refresh and reconnect; if it doesn't, check the auth
refresh path.

## Symptom: Close frame with code 4003

Server-side abort. Look for `ws.disconnect` in logs:

- `reason = "ping_timeout"` → client missed too many pongs.
- `reason = "duplicate_connection"` → another client signed in
  with the same token; the older one is closed.

## Symptom: Connects fine but no events arrive

You see the hello frame, but then nothing — even when you know
messages are happening.

- **Wrong tenant scope**: are you signed in as a different tenant
  than the one with traffic? Check the hello frame's `tenant_id`.
- **Rate limit**: you're sending events that fail rate-limit
  checks; the events never enter the stream.
- **WS-side bug**: rare, but check the server logs for
  `ws.fanout_failed`. If many, restart the API pod.

## Reconnect strategy

The web client uses:

1. **First disconnect** → reconnect immediately.
2. **Successive failures** → exponential backoff (1s, 2s, 4s, 8s,
   max 30s).
3. **On reconnect** → `/v1/sync?since=<last_stream_position>` to
   bridge the gap, then resume the WS.

If you're writing your own client, copy that pattern. Don't try
to keep WS open across long network outages — fall back to sync.

## Stream-position tracking

Persist `last_stream_position` (the highest stream_position you've
processed) somewhere durable on the client. On reconnect, query
`/v1/sync?since=<that>` first. Without this, you miss events that
happened during the disconnect.

## Where to look (admin)

- Logs: `ws.connect`, `ws.disconnect`, `ws.fanout_failed`.
- Metric: `scaiwave_ws_connections{tenant}` — should be stable
  during normal operation.
- Metric: `scaiwave_ws_messages_dropped_total` — non-zero is bad.
