---
audience: everyone
summary: Hybrid BM25 + vector retrieval over messages, notes, and file content.
title: Search
path: concepts/search
status: published
---

# Search

ScaiWave indexes every message, every note, and every uploaded file's
extracted text into a Weaviate vector store. Search is **hybrid**:
BM25 keyword ranking combined with vector similarity, with the two
fused at query time.

## What's indexed

- **Messages** (`swp.room.message`) from every room you can see.
  Incognito rooms are excluded; their messages carry `do_not_index=true`.
- **Notes** — body, title, tags, category.
- **Uploaded file content** — PDFs, Word, PowerPoint, Excel, images
  (OCR), and audio (transcription) are extracted and indexed.

## Where to search

- **Global search**: top of the sidebar, or `Cmd/Ctrl+K`. Searches
  messages + notes across every workspace you're in.
- **In a room**: the chat header's search icon. Searches just that
  room's messages.
- **Notes**: the notes panel has its own search scoped to the current
  workspace by default.
- **API**: `POST /v1/search` (curated), `POST /v1/notes/query`.
- **AI tool**: `search` and `search_all` plugins. The AI uses these
  when you ask "find that thing X said about Y".

## How hybrid works

For each query:

1. **BM25** scores documents by keyword match (the classic
   IR algorithm).
2. **Vector**: the query is embedded via ScaiGrid; documents are
   ranked by cosine similarity against their pre-computed
   embeddings.
3. **Fusion**: scores are blended (default α=0.5; tunable per
   collection).

The result is a list of hits with a unified `score`, sorted descending.

When embedding the query fails (rare; usually a backend hiccup),
search degrades gracefully to BM25-only — keyword search keeps
working.

## What you can filter on

| Filter | Where it works |
|---|---|
| `tenant_id` | Always applied automatically. |
| `room_id` | Restricts to one room. |
| `workspace_ids` | Restricts to a set of workspaces. |
| `owner_id` | Notes only — your notes. |
| `path_prefix` | Docs only — under a tree. |
| `file_types` | Files only — `["pdf", "docx", …]`. |

## Indexing lag

Indexing is asynchronous (via the ARQ worker). A message is
searchable within ~2 seconds of being sent. For test loops, wait
two seconds after sending before querying.

## Anchors and deep links

Search results include a `room_id` (or `note_id`) and an event /
chunk identifier. The client uses these to deep-link straight to the
matching message or section.

## Index admin

Admins can re-index a collection if something gets out of sync via
the CLI:

```bash
python scripts/backfill_search_index.py --execute --inline
```

See [CLI reference](/docs/scaiwave/reference/cli) for the full
surface.

## Where to go next

- API: [Search](/docs/scaiwave/reference/api/ai) (the `/v1/search`
  endpoint lives under the AI router).
- Reference: [CLI](/docs/scaiwave/reference/cli) — for index admin.
