Versioning and Deduplication
Files in ScaiDrive are versioned — every save creates a new version, prior versions are retained, and you can roll back. The blob content is chunked and deduplicated — identical chunks are stored exactly once, across all files and all users within a tenant.
Versioning#
Every file has a monotonically increasing version number. Version 1 is the initial upload; version 2 is the first edit; and so on.
Read current version:
1 2 | |
1 2 3 4 5 6 7 8 9 10 | |
List all versions:
1 2 | |
1 2 3 4 5 6 7 8 9 | |
Download a specific version:
1 2 3 | |
Restore a previous version (creates a new version with the old content):
1 2 | |
1 2 3 4 5 | |
Restoring version 2 when you're at version 4 creates version 5 with the content of version 2. The intermediate versions (3 and 4) are still in history — nothing is destroyed.
Retention of old versions#
By default, all versions are retained. Three knobs control pruning:
- Tenant version-retention policy — a global cap like "keep the last N versions" or "keep versions from the last N days."
- Per-share retention policy — overrides tenant-level for that share.
- Enterprise retention policy — can override both, and can be write-protected (see Enterprise Compliance).
Pruned versions are removed on the next retention-sweep run (hourly by default). The chunks referenced only by pruned versions are garbage-collected separately.
Deduplication#
Under the hood, each file version is a list of chunks. A chunk is:
- A byte range of file content (typically 4 MB).
- Identified by its SHA-256 hash.
- Stored once in object storage at
chunks/{tenant_id}/{hash[0:2]}/{hash[2:4]}/{hash}. - Reference-counted — the chunk is deleted only when no file version references it.
When you upload a file, the server (or the client, if it's a sync client doing block-level uploads) computes chunk hashes. Before transferring each chunk, the client asks "do you already have this?" If yes, only the hash needs to be stored against the new file version — no bytes move.
Result: if you upload the same file twice, the second upload transfers zero bytes of content. If you change 2 MB of a 500 MB file, only those 2 MB transfer.
How this shows up to clients#
For the regular file API (POST /api/v1/files with a multipart body), the server handles chunking and deduplication transparently — you just upload the file, the server does the rest.
For the block-level sync upload path (used by the desktop client), the client:
- Chunks the file locally with a rolling hash window.
- Computes SHA-256 per chunk.
- Asks the server which chunks it already has.
- Uploads only the missing chunks.
- POSTs a manifest
{file_metadata, chunk_hashes[]}to register the new version.
See Resumable Uploads for the low-level upload protocol.
Storage efficiency in practice#
A typical knowledge-worker share:
| Scenario | Naive storage | ScaiDrive |
|---|---|---|
| 50 employees each get the same 20 MB quarterly report email attachment | 1 GB | 20 MB |
| A 100 MB PowerPoint edited 20 times over a quarter (5% changes per edit) | 2 GB | ~200 MB |
| A CSV that gets appended to daily for a year (starting at 10 MB, ending at 50 MB) | gigabytes | < 50 MB plus per-day deltas |
Real-world savings vary by content mix, but 3–10× is typical for office documents.
Deletion and content#
Soft-deleting a file (DELETE /api/v1/files/{id}) marks is_deleted: true. The content stays in storage, consuming trash_bytes. Restore is possible until trash is purged.
Permanent deletion (DELETE /api/v1/files/{id}?permanent=true) removes the file record. The chunk reference counts decrement. Chunks that drop to zero are garbage-collected on the next GC sweep.
Legal Hold and Retention policies can prevent deletion — see Enterprise Compliance.
Hash verification#
Every file metadata response includes checksum_sha256 (the hash of the entire file content, not individual chunks). Clients should verify the hash after download:
1 2 3 4 5 6 7 | |
What's next#
- Files Guide — uploads, versioning, copy, move.
- Resumable Uploads — the block-level upload protocol.
- Enterprise Compliance — retention and legal hold.