Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Search

ScaiWave indexes every message, every note, and every uploaded file's extracted text into a Weaviate vector store. Search is hybrid: BM25 keyword ranking combined with vector similarity, with the two fused at query time.

What's indexed#

  • Messages (swp.room.message) from every room you can see. Incognito rooms are excluded; their messages carry do_not_index=true.
  • Notes — body, title, tags, category.
  • Uploaded file content — PDFs, Word, PowerPoint, Excel, images (OCR), and audio (transcription) are extracted and indexed.
  • Global search: top of the sidebar, or Cmd/Ctrl+K. Searches messages + notes across every workspace you're in.
  • In a room: the chat header's search icon. Searches just that room's messages.
  • Notes: the notes panel has its own search scoped to the current workspace by default.
  • API: POST /v1/search (curated), POST /v1/notes/query.
  • AI tool: search and search_all plugins. The AI uses these when you ask "find that thing X said about Y".

How hybrid works#

For each query:

  1. BM25 scores documents by keyword match (the classic IR algorithm).
  2. Vector: the query is embedded via ScaiGrid; documents are ranked by cosine similarity against their pre-computed embeddings.
  3. Fusion: scores are blended (default α=0.5; tunable per collection).

The result is a list of hits with a unified score, sorted descending.

When embedding the query fails (rare; usually a backend hiccup), search degrades gracefully to BM25-only — keyword search keeps working.

What you can filter on#

Filter Where it works
tenant_id Always applied automatically.
room_id Restricts to one room.
workspace_ids Restricts to a set of workspaces.
owner_id Notes only — your notes.
path_prefix Docs only — under a tree.
file_types Files only — ["pdf", "docx", …].

Indexing lag#

Indexing is asynchronous (via the ARQ worker). A message is searchable within ~2 seconds of being sent. For test loops, wait two seconds after sending before querying.

Search results include a room_id (or note_id) and an event / chunk identifier. The client uses these to deep-link straight to the matching message or section.

Index admin#

Admins can re-index a collection if something gets out of sync via the CLI:

bash
1
python scripts/backfill_search_index.py --execute --inline

See CLI reference for the full surface.

Where to go next#

  • API: Search (the /v1/search endpoint lives under the AI router).
  • Reference: CLI — for index admin.
Updated 2026-05-17 13:10:02 View source (.md) rev 3