---
title: Versioning and Deduplication
path: core-concepts/versioning-and-deduplication
status: published
---

Files in ScaiDrive are **versioned** — every save creates a new version, prior versions are retained, and you can roll back. The blob content is **chunked** and **deduplicated** — identical chunks are stored exactly once, across all files and all users within a tenant.

## Versioning

Every file has a monotonically increasing `version` number. Version 1 is the initial upload; version 2 is the first edit; and so on.

Read current version:

```bash
curl -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
     $SCAIDRIVE_URL/api/v1/files/fil_01J3K
```

```json
{
  "id": "fil_01J3K",
  "name": "spec.pdf",
  "size": 48293,
  "checksum_sha256": "a9f2...",
  "version": 4,
  "mime_type": "application/pdf",
  "created_at": "2026-04-20T10:00:00Z",
  "modified_at": "2026-04-23T10:15:00Z"
}
```

List all versions:

```bash
curl -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
     $SCAIDRIVE_URL/api/v1/files/fil_01J3K/versions
```

```json
{
  "versions": [
    {"id": "ver_01J4Q", "version_number": 4, "size": 48293, "checksum_sha256": "a9f2...", "created_by": "usr_01J3N", "created_at": "2026-04-23T10:15:00Z"},
    {"id": "ver_01J4P", "version_number": 3, "size": 47110, "checksum_sha256": "7b1e...", "created_by": "usr_01J3N", "created_at": "2026-04-22T11:00:00Z"},
    {"id": "ver_01J4O", "version_number": 2, "size": 41022, "checksum_sha256": "3c88...", "created_by": "usr_01J4A", "created_at": "2026-04-21T09:30:00Z"},
    {"id": "ver_01J4N", "version_number": 1, "size": 40008, "checksum_sha256": "e2a0...", "created_by": "usr_01J3N", "created_at": "2026-04-20T10:00:00Z"}
  ],
  "total": 4
}
```

Download a specific version:

```bash
curl -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
     -o "spec-v2.pdf" \
     "$SCAIDRIVE_URL/api/v1/files/fil_01J3K/content?version=2"
```

Restore a previous version (creates a new version with the old content):

```bash
curl -X POST $SCAIDRIVE_URL/api/v1/files/fil_01J3K/versions/2/restore \
  -H "Authorization: Bearer $SCAIDRIVE_TOKEN"
```

```json
{
  "id": "fil_01J3K",
  "restored_version": 2,
  "new_version_id": "ver_01J5S"
}
```

Restoring version 2 when you're at version 4 creates version 5 with the content of version 2. The intermediate versions (3 and 4) are still in history — nothing is destroyed.

## Retention of old versions

By default, all versions are retained. Three knobs control pruning:

- **Tenant version-retention policy** — a global cap like "keep the last N versions" or "keep versions from the last N days."
- **Per-share retention policy** — overrides tenant-level for that share.
- **Enterprise retention policy** — can override both, and can be write-protected (see [Enterprise Compliance](/docs/scaidrive/advanced/enterprise-compliance)).

Pruned versions are removed on the next retention-sweep run (hourly by default). The chunks referenced only by pruned versions are garbage-collected separately.

## Deduplication

Under the hood, each file version is a list of **chunks**. A chunk is:

- A byte range of file content (typically 4 MB).
- Identified by its SHA-256 hash.
- Stored once in object storage at `chunks/{tenant_id}/{hash[0:2]}/{hash[2:4]}/{hash}`.
- Reference-counted — the chunk is deleted only when no file version references it.

When you upload a file, the server (or the client, if it's a sync client doing block-level uploads) computes chunk hashes. Before transferring each chunk, the client asks "do you already have this?" If yes, only the hash needs to be stored against the new file version — no bytes move.

Result: if you upload the same file twice, the second upload transfers zero bytes of content. If you change 2 MB of a 500 MB file, only those 2 MB transfer.

## How this shows up to clients

For the regular file API (`POST /api/v1/files` with a multipart body), the server handles chunking and deduplication transparently — you just upload the file, the server does the rest.

For the block-level sync upload path (used by the desktop client), the client:

1. Chunks the file locally with a rolling hash window.
2. Computes SHA-256 per chunk.
3. Asks the server which chunks it already has.
4. Uploads only the missing chunks.
5. POSTs a manifest `{file_metadata, chunk_hashes[]}` to register the new version.

See [Resumable Uploads](/docs/scaidrive/advanced/resumable-uploads) for the low-level upload protocol.

## Storage efficiency in practice

A typical knowledge-worker share:

| Scenario | Naive storage | ScaiDrive |
|----------|---------------|-----------|
| 50 employees each get the same 20 MB quarterly report email attachment | 1 GB | 20 MB |
| A 100 MB PowerPoint edited 20 times over a quarter (5% changes per edit) | 2 GB | ~200 MB |
| A CSV that gets appended to daily for a year (starting at 10 MB, ending at 50 MB) | gigabytes | < 50 MB plus per-day deltas |

Real-world savings vary by content mix, but 3–10× is typical for office documents.

## Deletion and content

Soft-deleting a file (`DELETE /api/v1/files/{id}`) marks `is_deleted: true`. The content stays in storage, consuming `trash_bytes`. Restore is possible until trash is purged.

Permanent deletion (`DELETE /api/v1/files/{id}?permanent=true`) removes the file record. The chunk reference counts decrement. Chunks that drop to zero are garbage-collected on the next GC sweep.

Legal Hold and Retention policies can prevent deletion — see [Enterprise Compliance](/docs/scaidrive/advanced/enterprise-compliance).

## Hash verification

Every file metadata response includes `checksum_sha256` (the hash of the entire file content, not individual chunks). Clients should verify the hash after download:

```python
import hashlib

resp = httpx.get(f"{url}/api/v1/files/{file_id}/content", headers=auth_hdr)
h = hashlib.sha256(resp.content).hexdigest()

meta = httpx.get(f"{url}/api/v1/files/{file_id}", headers=auth_hdr).json()
assert h == meta["checksum_sha256"], "content corruption"
```

## What's next

- [Files Guide](/docs/scaidrive/api-guides/files) — uploads, versioning, copy, move.
- [Resumable Uploads](/docs/scaidrive/advanced/resumable-uploads) — the block-level upload protocol.
- [Enterprise Compliance](/docs/scaidrive/advanced/enterprise-compliance) — retention and legal hold.