Search
ScaiWave indexes every message, every note, and every uploaded file's extracted text into a Weaviate vector store. Search is hybrid: BM25 keyword ranking combined with vector similarity, with the two fused at query time.
What's indexed#
- Messages (
swp.room.message) from every room you can see. Incognito rooms are excluded; their messages carrydo_not_index=true. - Notes — body, title, tags, category.
- Uploaded file content — PDFs, Word, PowerPoint, Excel, images (OCR), and audio (transcription) are extracted and indexed.
Where to search#
- Global search: top of the sidebar, or
Cmd/Ctrl+K. Searches messages + notes across every workspace you're in. - In a room: the chat header's search icon. Searches just that room's messages.
- Notes: the notes panel has its own search scoped to the current workspace by default.
- API:
POST /v1/search(curated),POST /v1/notes/query. - AI tool:
searchandsearch_allplugins. The AI uses these when you ask "find that thing X said about Y".
How hybrid works#
For each query:
- BM25 scores documents by keyword match (the classic IR algorithm).
- Vector: the query is embedded via ScaiGrid; documents are ranked by cosine similarity against their pre-computed embeddings.
- Fusion: scores are blended (default α=0.5; tunable per collection).
The result is a list of hits with a unified score, sorted descending.
When embedding the query fails (rare; usually a backend hiccup), search degrades gracefully to BM25-only — keyword search keeps working.
What you can filter on#
| Filter | Where it works |
|---|---|
tenant_id |
Always applied automatically. |
room_id |
Restricts to one room. |
workspace_ids |
Restricts to a set of workspaces. |
owner_id |
Notes only — your notes. |
path_prefix |
Docs only — under a tree. |
file_types |
Files only — ["pdf", "docx", …]. |
Indexing lag#
Indexing is asynchronous (via the ARQ worker). A message is searchable within ~2 seconds of being sent. For test loops, wait two seconds after sending before querying.
Anchors and deep links#
Search results include a room_id (or note_id) and an event /
chunk identifier. The client uses these to deep-link straight to the
matching message or section.
Index admin#
Admins can re-index a collection if something gets out of sync via the CLI:
1 | |
See CLI reference for the full surface.