---
summary: "End-to-end recipe \u2014 provision a Python bunker, upload a CSV, run pandas\
  \ analysis, capture the result, snapshot it for later."
title: Build a data analysis sandbox
path: tutorials/data-analysis-sandbox
status: published
---

You're going from zero to a working data-analysis sandbox a colleague can drive interactively: a Python 3.12 bunker with pandas installed, your CSV uploaded into it, an analysis script that produces a result, and a snapshot you can restore on demand.

Roughly 10 minutes if you already have the CSV.

## 1. Decide the bunker's shape

Before any API calls, settle these:

- **Lifecycle.** One-shot analysis → `ephemeral`. A notebook-style session that survives across multiple `exec` calls → `session`. Tied to a ScaiCore agent that always has it available → `persistent`.
- **Resources.** Pandas needs more memory than the default 512 MB; bump to 2048 if your dataset is non-trivial.
- **Network.** You need `pip install pandas` to work → `registry`. If your script also needs to reach an internal API, use `allowlisted` plus a domain list.
- **Image.** The platform-managed `python-3.12` image is usually right. If you need PyTorch / CUDA, use `python-3.12-ml`. If you need R or Julia, register your own image (see [Register a custom image](./register-custom-image)).

## 2. Provision the bunker

Create a session bunker with the bumped resource numbers and a longer idle timeout so it sticks around between command bursts.

```bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaibunker/bunkers" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name":"quarterly-revenue-analysis","image":"python-3.12","lifecycle_mode":"session","network_profile":"registry","cpu_millicores":2000,"memory_mb":2048,"disk_mb":4096,"max_lifetime_s":28800,"idle_timeout_s":1800}'
```

```python
import httpx, os

H = {"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"}
HOST = os.environ["SCAIGRID_HOST"]

bunker = httpx.post(
    f"{HOST}/v1/modules/scaibunker/bunkers",
    headers=H,
    json={
        "name": "quarterly-revenue-analysis",
        "image": "python-3.12",
        "lifecycle_mode": "session",
        "network_profile": "registry",
        "cpu_millicores": 2000,
        "memory_mb": 2048,
        "disk_mb": 4096,
        "max_lifetime_s": 28800,
        "idle_timeout_s": 1800,
    },
).json()["data"]
BUNKER = bunker["id"]
print("bunker:", BUNKER, "status:", bunker["status"])
```

```javascript
const HOST = process.env.SCAIGRID_HOST;
const H = { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` };

const res = await fetch(`${HOST}/v1/modules/scaibunker/bunkers`, {
  method: "POST",
  headers: { ...H, "Content-Type": "application/json" },
  body: JSON.stringify({
    name: "quarterly-revenue-analysis",
    image: "python-3.12",
    lifecycle_mode: "session",
    network_profile: "registry",
    cpu_millicores: 2000,
    memory_mb: 2048,
    disk_mb: 4096,
    max_lifetime_s: 28800,
    idle_timeout_s: 1800,
  }),
});
const { data: bunker } = await res.json();
const BUNKER = bunker.id;
console.log("bunker:", BUNKER, "status:", bunker.status);
```

Poll `GET /bunkers/{id}` until status is `running` (usually instant on a warm worker).

## 3. Install pandas

The `registry` profile lets `pip` reach PyPI. A single exec call installs pandas and pyarrow into the bunker; the wheels stay there for the lifetime of the session.

```python
result = httpx.post(
    f"{HOST}/v1/modules/scaibunker/bunkers/{BUNKER}/exec",
    headers=H,
    json={
        "command": "pip install --quiet pandas pyarrow",
        "timeout_s": 180,
    },
    timeout=240,
).json()["data"]
assert result["exit_code"] == 0, result["stderr"]
```

The `registry` profile lets `pip` reach PyPI; the install completes in a few seconds on a warm bunker.

## 4. Upload the CSV

PUT the CSV inline. The path under `/files/` is rooted in the bunker's filesystem; `/workspace` is the convention for caller-supplied data.

```bash
curl -X PUT "$SCAIGRID_HOST/v1/modules/scaibunker/bunkers/$BUNKER/files/workspace/revenue.csv" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: text/csv" \
  --data-binary @local-revenue.csv
```

For files larger than ~10 MB, prefer the staged upload flow (`POST /files/upload` then `POST /files/commit`) — see the [API reference](../reference/api) for details.

## 5. Run the analysis

Pipe a multi-line Python script in via `stdin` rather than escaping it as a `-c` argument. The script writes its result back to `/workspace/by-quarter.csv` inside the bunker.

```python
script = """
import pandas as pd
df = pd.read_csv('/workspace/revenue.csv', parse_dates=['date'])
by_q = df.set_index('date').groupby(pd.Grouper(freq='QE'))['revenue'].sum()
by_q.to_csv('/workspace/by-quarter.csv')
print(by_q.to_string())
"""
result = httpx.post(
    f"{HOST}/v1/modules/scaibunker/bunkers/{BUNKER}/exec",
    headers=H,
    json={"command": "python3 -", "stdin": script, "timeout_s": 120},
    timeout=180,
).json()["data"]
print(result["stdout"])
```

`stdin: script` pipes the multi-line Python through `python3 -`, which is cleaner than escaping it as a `-c` argument. The output table prints inline; the CSV is left in the bunker's `/workspace`.

## 6. Read the result back

Fetch the CSV the same way you uploaded the input — a GET against the same `/files/{path}` endpoint, streamed to a local file.

```bash
curl "$SCAIGRID_HOST/v1/modules/scaibunker/bunkers/$BUNKER/files/workspace/by-quarter.csv" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  --output by-quarter.csv
```

## 7. Snapshot for later

Session bunkers are checkpoint-able — take a snapshot now and the worker can restore it on any other worker later (after evict, after maintenance, after a restart).

```javascript
const snapResp = await fetch(
  `${HOST}/v1/modules/scaibunker/bunkers/${BUNKER}/snapshot`,
  { method: "POST", headers: H },
);
const { data: snap } = await snapResp.json();
console.log("snapshot:", snap.snapshot_id);
```

The snapshot is a tar.gz of the bunker's rootfs, stored in S3 under `scaibunker/snapshots/{bunker_id}/`. You can download it via `GET /snapshots/{id}/archive` if you want the bytes for offline analysis.

## 8. Terminate (or pause)

For a session you'll come back to within the day, pause (`POST /bunkers/{id}/pause`) instead of terminating — the bunker stays scheduled and resumes in milliseconds. For a definite end:

```bash
curl -X DELETE "$SCAIGRID_HOST/v1/modules/scaibunker/bunkers/$BUNKER?snapshot=true" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"
```

With `?snapshot=true`, ScaiBunker takes a final snapshot before destroying the bunker — useful when you want both quota back and the option to restore.

## What you've built

- A reusable analysis sandbox a teammate can drive by passing the bunker id around.
- An audit trail (every exec call is in the exec log; every file op is in there too).
- A snapshot you can restore on any worker.
- A quota-checked workload that won't run away with your tenant's resources.

Next: try the [register a custom image](./register-custom-image) tutorial if `python-3.12` doesn't have what you need pre-installed.
