Search

ScaiWave indexes every message, every note, and every uploaded file's extracted text into a Weaviate vector store. Search is hybrid: BM25 keyword ranking combined with vector similarity, with the two fused at query time.

What's indexed#

Messages (swp.room.message) from every room you can see. Incognito rooms are excluded; their messages carry do_not_index=true.
Notes — body, title, tags, category.
Uploaded file content — PDFs, Word, PowerPoint, Excel, images (OCR), and audio (transcription) are extracted and indexed.

Where to search#

Global search: top of the sidebar, or Cmd/Ctrl+K. Searches messages + notes across every workspace you're in.
In a room: the chat header's search icon. Searches just that room's messages.
Notes: the notes panel has its own search scoped to the current workspace by default.
API: POST /v1/search (curated), POST /v1/notes/query.
AI tool: search and search_all plugins. The AI uses these when you ask "find that thing X said about Y".

How hybrid works#

For each query:

BM25 scores documents by keyword match (the classic IR algorithm).
Vector: the query is embedded via ScaiGrid; documents are ranked by cosine similarity against their pre-computed embeddings.
Fusion: scores are blended (default α=0.5; tunable per collection).

The result is a list of hits with a unified score, sorted descending.

When embedding the query fails (rare; usually a backend hiccup), search degrades gracefully to BM25-only — keyword search keeps working.

What you can filter on#

Filter	Where it works
`tenant_id`	Always applied automatically.
`room_id`	Restricts to one room.
`workspace_ids`	Restricts to a set of workspaces.
`owner_id`	Notes only — your notes.
`path_prefix`	Docs only — under a tree.
`file_types`	Files only — `["pdf", "docx", …]`.

Indexing lag#

Indexing is asynchronous (via the ARQ worker). A message is searchable within ~2 seconds of being sent. For test loops, wait two seconds after sending before querying.

Anchors and deep links#

Search results include a room_id (or note_id) and an event / chunk identifier. The client uses these to deep-link straight to the matching message or section.

Index admin#

Admins can re-index a collection if something gets out of sync via the CLI:

bash
python scripts/backfill_search_index.py --execute --inline

See CLI reference for the full surface.

Where to go next#

API: Search (the /v1/search endpoint lives under the AI router).
Reference: CLI — for index admin.