---
title: Search
path: api-guides/search
status: published
---

Three search modes in ScaiDrive: **keyword** (BM25 over filenames and content), **semantic** (vector similarity over embeddings), and **hybrid** (blend the two). Plus a **RAG context** endpoint that packages results for an LLM.

**Base path:** `/api/v1/search/`

All search results respect the caller's permissions — you only see files you can read.

## Keyword search

Full-text search over filenames (always) and extracted content (where available).

```bash
curl -G $SCAIDRIVE_URL/api/v1/search \
  -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
  --data-urlencode "q=annual report 2025" \
  --data-urlencode "share_id=shr_01J3H" \
  --data-urlencode "file_type=application/pdf" \
  --data-urlencode "limit=20"
```

```python
resp = httpx.get(
    f"{url}/api/v1/search",
    headers={"Authorization": f"Bearer {token}"},
    params={"q": "annual report 2025", "share_id": "shr_01J3H", "limit": 20},
)
for hit in resp.json()["results"]:
    print(hit["score"], hit["path"])
```

```typescript
const params = new URLSearchParams({
  q: "annual report 2025",
  share_id: "shr_01J3H",
  limit: "20",
});
const resp = await fetch(`${url}/api/v1/search?${params}`, {
  headers: { Authorization: `Bearer ${token}` },
});
for (const hit of (await resp.json()).results) {
  console.log(hit.score, hit.path);
}
```

Response:

```json
{
  "query": "annual report 2025",
  "results": [
    {
      "id": "fil_01J3K",
      "name": "annual-report-2025.pdf",
      "type": "file",
      "share_id": "shr_01J3H",
      "folder_id": "fld_01J3I",
      "path": "/Finance/annual-report-2025.pdf",
      "mime_type": "application/pdf",
      "size": 5820193,
      "modified_at": "2026-04-20T10:00:00Z",
      "score": 8.42
    }
  ],
  "total": 1,
  "has_more": false
}
```

Parameters:

| Param | Notes |
|-------|-------|
| `q` | Search query |
| `share_id` | Scope to one share |
| `file_type` | MIME pattern: `application/pdf`, `image/*` |
| `recursive` | Default true |
| `limit` | 1–100, default 50 |
| `offset` | For pagination |

POST form is also available for longer queries:

```bash
curl -X POST $SCAIDRIVE_URL/api/v1/search \
  -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "annual report 2025", "share_id": "shr_01J3H", "limit": 20}'
```

## Semantic search

Vector search over indexed chunks. Requires the tenant to have a [vectorization provider](#configuring-vectorization) configured.

```bash
curl -G $SCAIDRIVE_URL/api/v1/search/semantic \
  -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
  --data-urlencode "q=what were our Q4 bookings last year?" \
  --data-urlencode "share_id=shr_01J3H" \
  --data-urlencode "limit=10"
```

```python
resp = httpx.get(
    f"{url}/api/v1/search/semantic",
    headers={"Authorization": f"Bearer {token}"},
    params={"q": "what were our Q4 bookings last year?", "share_id": "shr_01J3H", "limit": 10},
)
for hit in resp.json()["semantic_results"]:
    print(hit["score"], hit["file_name"], hit["chunk_content"][:100])
```

```typescript
const params = new URLSearchParams({
  q: "what were our Q4 bookings last year?",
  share_id: "shr_01J3H",
  limit: "10",
});
const resp = await fetch(`${url}/api/v1/search/semantic?${params}`, {
  headers: { Authorization: `Bearer ${token}` },
});
for (const hit of (await resp.json()).semantic_results) {
  console.log(hit.score, hit.file_name, hit.chunk_content.slice(0, 100));
}
```

Response:

```json
{
  "semantic_results": [
    {
      "file_id": "fil_01J3K",
      "file_name": "annual-report-2025.pdf",
      "share_id": "shr_01J3H",
      "path": "/Finance/annual-report-2025.pdf",
      "chunk_content": "Q4 2024 bookings totaled $42.3M, representing a 28% increase year-over-year...",
      "chunk_index": 17,
      "page": 12,
      "section": "Financial Highlights",
      "score": 0.89,
      "distance": 0.11
    }
  ]
}
```

Results are **chunks**, not files — the same file can produce multiple hits if different sections match. `chunk_content` is the matching passage (typically a paragraph); `score` is semantic similarity (higher is better); `distance` is the raw vector distance (lower is better).

POST form with richer filters:

```bash
curl -X POST $SCAIDRIVE_URL/api/v1/search/semantic \
  -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what were our Q4 bookings?",
    "share_id": "shr_01J3H",
    "file_types": ["application/pdf", "text/markdown"],
    "path_prefix": "/Finance",
    "limit": 10
  }'
```

## Hybrid search

Blend BM25 and vector scores. Alpha controls the blend: 0.0 is pure BM25, 1.0 is pure vector, 0.7 is the default.

```bash
curl -X POST $SCAIDRIVE_URL/api/v1/search/hybrid \
  -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Q4 bookings",
    "share_id": "shr_01J3H",
    "alpha": 0.7,
    "limit": 20
  }'
```

Use hybrid when your users' queries mix specific terms ("Q4") with conceptual phrasing ("bookings performance"). BM25 alone misses the concept; pure vector misses the exact term.

## RAG context

For LLM workflows, `/api/v1/search/context` returns search results already formatted as a context string with citations, plus a token estimate.

```bash
curl -X POST $SCAIDRIVE_URL/api/v1/search/context \
  -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what were our Q4 bookings last year?",
    "share_id": "shr_01J3H",
    "max_tokens": 2000,
    "max_chunks": 10
  }'
```

Response:

```json
{
  "context": "[1] From annual-report-2025.pdf, page 12: Q4 2024 bookings totaled $42.3M...\n\n[2] From q4-narrative.docx: Strong enterprise momentum drove...",
  "chunks": [
    {"content": "Q4 2024 bookings...", "file_id": "fil_01J3K", "file_name": "annual-report-2025.pdf", "path": "/Finance/annual-report-2025.pdf", "page": 12, "section": "Financial Highlights", "score": 0.89}
  ],
  "estimated_tokens": 1234
}
```

Pass the `context` string into your LLM prompt. The `[N]` markers and the `chunks` array let your UI link "source 1" back to a specific file.

## Checking index status

To check whether a file has been indexed:

```bash
curl -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
     $SCAIDRIVE_URL/api/v1/search/index-status/fil_01J3K
```

```json
{
  "is_indexed": true,
  "chunk_count": 42,
  "last_indexed": "2026-04-20T10:30:00Z",
  "error": null
}
```

Tenant-wide statistics:

```bash
curl -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
     $SCAIDRIVE_URL/api/v1/search/stats
```

Queue status:

```bash
curl -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
     "$SCAIDRIVE_URL/api/v1/search/queue?limit=20"
```

## Configuring vectorization

Semantic and hybrid search require a **vectorization policy** to tell ScaiDrive what to index, how to chunk it, and which embedding model to use. Policies are tenant-level admin objects.

Minimal setup:

1. Configure an embedding provider (OpenAI, Cohere, Bedrock, Hugging Face, or local model).
2. Create a policy scoping what to index.
3. ScaiDrive indexes existing files in the background, and all new uploads automatically.

See [Enterprise Reference](/docs/scaidrive/reference/enterprise) for the policy API.

## Permissions

Search only returns files the caller can read. If you don't have permission on a file, it doesn't appear — even if it semantically matches. This works at chunk granularity: even if one share contains information relevant to your query, you won't see chunks from files you can't read.

## Health check

Before your application depends on semantic search, verify the vectorization stack is up:

```bash
curl $SCAIDRIVE_URL/api/v1/search/health
```

```json
{
  "weaviate_connected": true,
  "embedding_service_available": true,
  "status": "healthy",
  "provider_name": "openai"
}
```

When unhealthy, semantic endpoints return `503 SERVICE_UNAVAILABLE` while keyword search continues to work.

## What's next

- [Search Reference](/docs/scaidrive/reference/search) — all endpoints.
- [Enterprise Reference](/docs/scaidrive/reference/enterprise) — vectorization policy management.
- [MCP Server](/docs/scaidrive/advanced/mcp-server) — expose search to Claude.