Build a data analysis sandbox
You're going from zero to a working data-analysis sandbox a colleague can drive interactively: a Python 3.12 bunker with pandas installed, your CSV uploaded into it, an analysis script that produces a result, and a snapshot you can restore on demand.
Roughly 10 minutes if you already have the CSV.
1. Decide the bunker's shape#
Before any API calls, settle these:
- Lifecycle. One-shot analysis →
ephemeral. A notebook-style session that survives across multipleexeccalls →session. Tied to a ScaiCore agent that always has it available →persistent. - Resources. Pandas needs more memory than the default 512 MB; bump to 2048 if your dataset is non-trivial.
- Network. You need
pip install pandasto work →registry. If your script also needs to reach an internal API, useallowlistedplus a domain list. - Image. The platform-managed
python-3.12image is usually right. If you need PyTorch / CUDA, usepython-3.12-ml. If you need R or Julia, register your own image (see Register a custom image).
2. Provision the bunker#
Create a session bunker with the bumped resource numbers and a longer idle timeout so it sticks around between command bursts.
1 2 3 4 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
Poll GET /bunkers/{id} until status is running (usually instant on a warm worker).
3. Install pandas#
The registry profile lets pip reach PyPI. A single exec call installs pandas and pyarrow into the bunker; the wheels stay there for the lifetime of the session.
1 2 3 4 5 6 7 8 9 10 | |
The registry profile lets pip reach PyPI; the install completes in a few seconds on a warm bunker.
4. Upload the CSV#
PUT the CSV inline. The path under /files/ is rooted in the bunker's filesystem; /workspace is the convention for caller-supplied data.
1 2 3 4 | |
For files larger than ~10 MB, prefer the staged upload flow (POST /files/upload then POST /files/commit) — see the API reference for details.
5. Run the analysis#
Pipe a multi-line Python script in via stdin rather than escaping it as a -c argument. The script writes its result back to /workspace/by-quarter.csv inside the bunker.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
stdin: script pipes the multi-line Python through python3 -, which is cleaner than escaping it as a -c argument. The output table prints inline; the CSV is left in the bunker's /workspace.
6. Read the result back#
Fetch the CSV the same way you uploaded the input — a GET against the same /files/{path} endpoint, streamed to a local file.
1 2 3 | |
7. Snapshot for later#
Session bunkers are checkpoint-able — take a snapshot now and the worker can restore it on any other worker later (after evict, after maintenance, after a restart).
1 2 3 4 5 6 | |
The snapshot is a tar.gz of the bunker's rootfs, stored in S3 under scaibunker/snapshots/{bunker_id}/. You can download it via GET /snapshots/{id}/archive if you want the bytes for offline analysis.
8. Terminate (or pause)#
For a session you'll come back to within the day, pause (POST /bunkers/{id}/pause) instead of terminating — the bunker stays scheduled and resumes in milliseconds. For a definite end:
1 2 | |
With ?snapshot=true, ScaiBunker takes a final snapshot before destroying the bunker — useful when you want both quota back and the option to restore.
What you've built#
- A reusable analysis sandbox a teammate can drive by passing the bunker id around.
- An audit trail (every exec call is in the exec log; every file op is in there too).
- A snapshot you can restore on any worker.
- A quota-checked workload that won't run away with your tenant's resources.
Next: try the register a custom image tutorial if python-3.12 doesn't have what you need pre-installed.