Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Build a data analysis sandbox

You're going from zero to a working data-analysis sandbox a colleague can drive interactively: a Python 3.12 bunker with pandas installed, your CSV uploaded into it, an analysis script that produces a result, and a snapshot you can restore on demand.

Roughly 10 minutes if you already have the CSV.

1. Decide the bunker's shape#

Before any API calls, settle these:

  • Lifecycle. One-shot analysis → ephemeral. A notebook-style session that survives across multiple exec calls → session. Tied to a ScaiCore agent that always has it available → persistent.
  • Resources. Pandas needs more memory than the default 512 MB; bump to 2048 if your dataset is non-trivial.
  • Network. You need pip install pandas to work → registry. If your script also needs to reach an internal API, use allowlisted plus a domain list.
  • Image. The platform-managed python-3.12 image is usually right. If you need PyTorch / CUDA, use python-3.12-ml. If you need R or Julia, register your own image (see Register a custom image).

2. Provision the bunker#

Create a session bunker with the bumped resource numbers and a longer idle timeout so it sticks around between command bursts.

bash
1
2
3
4
curl -X POST "$SCAIGRID_HOST/v1/modules/scaibunker/bunkers" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name":"quarterly-revenue-analysis","image":"python-3.12","lifecycle_mode":"session","network_profile":"registry","cpu_millicores":2000,"memory_mb":2048,"disk_mb":4096,"max_lifetime_s":28800,"idle_timeout_s":1800}'
python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import httpx, os

H = {"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"}
HOST = os.environ["SCAIGRID_HOST"]

bunker = httpx.post(
    f"{HOST}/v1/modules/scaibunker/bunkers",
    headers=H,
    json={
        "name": "quarterly-revenue-analysis",
        "image": "python-3.12",
        "lifecycle_mode": "session",
        "network_profile": "registry",
        "cpu_millicores": 2000,
        "memory_mb": 2048,
        "disk_mb": 4096,
        "max_lifetime_s": 28800,
        "idle_timeout_s": 1800,
    },
).json()["data"]
BUNKER = bunker["id"]
print("bunker:", BUNKER, "status:", bunker["status"])
javascript
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
const HOST = process.env.SCAIGRID_HOST;
const H = { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` };

const res = await fetch(`${HOST}/v1/modules/scaibunker/bunkers`, {
  method: "POST",
  headers: { ...H, "Content-Type": "application/json" },
  body: JSON.stringify({
    name: "quarterly-revenue-analysis",
    image: "python-3.12",
    lifecycle_mode: "session",
    network_profile: "registry",
    cpu_millicores: 2000,
    memory_mb: 2048,
    disk_mb: 4096,
    max_lifetime_s: 28800,
    idle_timeout_s: 1800,
  }),
});
const { data: bunker } = await res.json();
const BUNKER = bunker.id;
console.log("bunker:", BUNKER, "status:", bunker.status);

Poll GET /bunkers/{id} until status is running (usually instant on a warm worker).

3. Install pandas#

The registry profile lets pip reach PyPI. A single exec call installs pandas and pyarrow into the bunker; the wheels stay there for the lifetime of the session.

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
result = httpx.post(
    f"{HOST}/v1/modules/scaibunker/bunkers/{BUNKER}/exec",
    headers=H,
    json={
        "command": "pip install --quiet pandas pyarrow",
        "timeout_s": 180,
    },
    timeout=240,
).json()["data"]
assert result["exit_code"] == 0, result["stderr"]

The registry profile lets pip reach PyPI; the install completes in a few seconds on a warm bunker.

4. Upload the CSV#

PUT the CSV inline. The path under /files/ is rooted in the bunker's filesystem; /workspace is the convention for caller-supplied data.

bash
1
2
3
4
curl -X PUT "$SCAIGRID_HOST/v1/modules/scaibunker/bunkers/$BUNKER/files/workspace/revenue.csv" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: text/csv" \
  --data-binary @local-revenue.csv

For files larger than ~10 MB, prefer the staged upload flow (POST /files/upload then POST /files/commit) — see the API reference for details.

5. Run the analysis#

Pipe a multi-line Python script in via stdin rather than escaping it as a -c argument. The script writes its result back to /workspace/by-quarter.csv inside the bunker.

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
script = """
import pandas as pd
df = pd.read_csv('/workspace/revenue.csv', parse_dates=['date'])
by_q = df.set_index('date').groupby(pd.Grouper(freq='QE'))['revenue'].sum()
by_q.to_csv('/workspace/by-quarter.csv')
print(by_q.to_string())
"""
result = httpx.post(
    f"{HOST}/v1/modules/scaibunker/bunkers/{BUNKER}/exec",
    headers=H,
    json={"command": "python3 -", "stdin": script, "timeout_s": 120},
    timeout=180,
).json()["data"]
print(result["stdout"])

stdin: script pipes the multi-line Python through python3 -, which is cleaner than escaping it as a -c argument. The output table prints inline; the CSV is left in the bunker's /workspace.

6. Read the result back#

Fetch the CSV the same way you uploaded the input — a GET against the same /files/{path} endpoint, streamed to a local file.

bash
1
2
3
curl "$SCAIGRID_HOST/v1/modules/scaibunker/bunkers/$BUNKER/files/workspace/by-quarter.csv" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  --output by-quarter.csv

7. Snapshot for later#

Session bunkers are checkpoint-able — take a snapshot now and the worker can restore it on any other worker later (after evict, after maintenance, after a restart).

javascript
1
2
3
4
5
6
const snapResp = await fetch(
  `${HOST}/v1/modules/scaibunker/bunkers/${BUNKER}/snapshot`,
  { method: "POST", headers: H },
);
const { data: snap } = await snapResp.json();
console.log("snapshot:", snap.snapshot_id);

The snapshot is a tar.gz of the bunker's rootfs, stored in S3 under scaibunker/snapshots/{bunker_id}/. You can download it via GET /snapshots/{id}/archive if you want the bytes for offline analysis.

8. Terminate (or pause)#

For a session you'll come back to within the day, pause (POST /bunkers/{id}/pause) instead of terminating — the bunker stays scheduled and resumes in milliseconds. For a definite end:

bash
1
2
curl -X DELETE "$SCAIGRID_HOST/v1/modules/scaibunker/bunkers/$BUNKER?snapshot=true" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"

With ?snapshot=true, ScaiBunker takes a final snapshot before destroying the bunker — useful when you want both quota back and the option to restore.

What you've built#

  • A reusable analysis sandbox a teammate can drive by passing the bunker id around.
  • An audit trail (every exec call is in the exec log; every file op is in there too).
  • A snapshot you can restore on any worker.
  • A quota-checked workload that won't run away with your tenant's resources.

Next: try the register a custom image tutorial if python-3.12 doesn't have what you need pre-installed.

Updated 2026-05-18 15:01:27 View source (.md) rev 12