---
summary: Hybrid BM25 + semantic search backed by Weaviate, with anchored snippets
  and per-namespace filters for docs.
title: Search
path: concepts/search
status: published
---

# Search

Both regular content and documentation are indexed into [Weaviate](https://weaviate.io)
and searchable through a hybrid (BM25 + vector) query.

## What gets indexed

Two collections:

- **`Content`** — one row per content item × locale. Title, summary, body,
  searchable field values, taxonomy term IDs, status, visibility.
- **`DocPage`** — one row per *chunk*, where docs are split on H1/H2/H3
  headings. Each chunk carries page metadata (namespace, version, path),
  the heading text, an anchor slug for deep linking, and the chunk body.

## Indexing flow

```
Content/Doc write → event bus → ARQ task → Weaviate upsert
```

The pipeline is async — there's a 1–3 second lag between writing and
searchability. CLIs `python -m scaicms.cli.index_management` (content) and
`python -m scaicms.cli docs-index` (docs) let you reindex, check
consistency, and reset the index.

## Hybrid alpha

`SEARCH_HYBRID_ALPHA=0.7` by default — leans semantic but still considers
keyword matches. Tune per environment.

## Embeddings

When `embedding_provider` is configured the backend calls out for vector
embeddings on each chunk; otherwise search falls back to pure BM25.
Embeddings cost money — disable them in dev to save calls.

## Searching docs

```bash
curl -X POST -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"query":"how do I shard models","namespace":"scaigrid","limit":10}' \
  "$API/api/v1/docs/_search"
```

Hits include `anchor`, so you can deep-link to the matching chunk:
`/docs/scaigrid/models/training#tensor-parallelism`.
