Front-door CLI for orchestrating a Docker sandbox and driving autonomous claude/codex coding tasks inside it — defaults to the interactive Claude/Codex TUI on your subscription, with an opt-in headless mode.
By default sboxai drives the interactive Claude TUI (claude --dangerously-skip-permissions) over tmux inside the sandbox, drawing your normal Claude subscription allowance — cost-free beyond the subscription. An opt-in headless mode (claude -p) is also available for a clean structured result, but it draws the separate monthly Agent-SDK credit pool ($20 Pro / $100 Max5x / $200 Max20x as of June 2026), not your interactive subscription — that billing difference is why it's opt-in. See Execution modes for details.
A lightweight daemon (Phase 2) manages the task queue over a unix socket. Status is derived from the daemon plus live state (Docker, tmux, the filesystem).
- Docker Desktop with the
docker sandboxplugin. - Bun (for dev and building from source).
- A manual OAuth login for
claude/codexperformed inside the sandbox viasboxai login— it cannot be scripted.
Build a single self-contained binary for your host:
make build # → dist/sboxaiOr download a release binary from GitHub Releases. Binaries are published per platform:
| Binary | Platform |
|---|---|
sboxai-darwin-arm64 |
macOS Apple Silicon |
sboxai-darwin-x64 |
macOS Intel |
sboxai-linux-x64 |
Linux x86_64 |
sboxai-linux-arm64 |
Linux ARM64 |
Verify a download against the published checksums:
sha256sum -c SHA256SUMS # verify all binaries present
sha256sum -c --ignore-missing SHA256SUMS # verify just the one you downloaded# 1. Build/refresh the sandbox (clones the project to work on)
sboxai setup --root ~/sandbox --repo-url https://github.com/you/project.git
# 2. Authenticate claude/codex inside the sandbox (manual OAuth)
sboxai login --root ~/sandbox
# 3. Confirm the plugin, a running sandbox, and valid Claude auth
sboxai doctor --root ~/sandbox
# 4. Drive a task to completion
sboxai run "fix the failing parser test" --root ~/sandboxSet SBOXAI_ROOT once and subsequent commands can omit --root:
export SBOXAI_ROOT=~/sandbox
sboxai doctor
sboxai run "fix the failing parser test"--root may be supplied via the SBOXAI_ROOT env var; --name via SBOXAI_SANDBOX (default dev).
| Command | Description | Flags |
|---|---|---|
setup |
Build/refresh the Docker sandbox via the vendored setup script | --root <dir>, --name <name>, --repo-url <url>, --repo-branch <branch> (default main) |
up |
Ensure the sandbox exists and is running (wakes a stopped one) | --root <dir>, --name <name> |
shell |
Open an interactive shell inside the sandbox | --root <dir>, --name <name> |
stop |
Stop the sandbox | --root <dir>, --name <name> |
rm |
Remove the sandbox (host .auth/ survives) |
--root <dir>, --name <name>, --yes |
| Command | Description | Flags |
|---|---|---|
login |
Drop into the sandbox shell to run claude login / codex login |
--root <dir>, --name <name> |
doctor |
Read-only health checklist (exits nonzero if a critical check fails) | --root <dir>, --name <name> |
| Command | Description | Flags |
|---|---|---|
run <task> |
Run one autonomous task in the sandbox (sequential, synchronous) | --root, --name, --workdir, --worktree, --repo, --model, --timeout, --mode tui|headless |
submit <task> |
Enqueue a task for daemon execution (async) | --root, --name, --workdir, --worktree, --repo, --model, --timeout, --mode tui|headless |
mode [value] |
Show the effective execution mode, or set it (tui | headless) — persisted to .sboxai/config.json |
--root, --name |
cancel <id> |
Cancel a queued or running task | --root, --name |
status |
Live status: sandbox, daemon, tasks, worktrees | --root, --name |
logs <slug> |
Print (or follow) the last-run transcript log for a slug | --root, --name, -f, --follow, --workdir |
Two execution modes; the default is tui.
tui(default) — drives the interactive Claude Code TUI inside the sandbox (via the vendoredtui-exec.shover tmux). Runs under your Claude subscription, so it's cost-free beyond the subscription. Completion is inferred from the TUI going idle.headless— runsclaude -p --output-format text --dangerously-skip-permissionswith the prompt piped on STDIN. Gives a clean exit code / result with no TUI screen-scraping, but draws the separate monthly Agent-SDK credit pool ($20 Pro / $100 Max5x / $200 Max20x), not your interactive subscription — hence opt-in.
Switch modes:
sboxai mode # show the effective mode and where it came from
sboxai mode headless # set + persist to .sboxai/config.json
sboxai mode tui # switch back
sboxai run --mode headless "…" # per-run override (also on `submit`)Precedence: --mode flag → SBOXAI_MODE env → .sboxai/config.json → default tui.
Every run's prompt is prefixed with a contract instructing the agent to run fully unattended — never ask questions, never block on a decision, make its own judgment calls and proceed. There is no human-reply path by design; ambiguity is resolved by the agent, not surfaced to a human.
| Command | Description | Flags |
|---|---|---|
daemon start |
Start the background task daemon | --root, --name |
daemon stop |
Stop the running daemon | --root, --name |
daemon status |
Check if daemon is running | --root, --name |
-
Interactive TUI via tmux. A task launches
claude --dangerously-skip-permissions [--model X]inside a detached tmux session (harness-<pid>-<epoch>), waits for the TUI to be ready, and injects the prompt via bracketed paste so multi-line text isn't submitted line-by-line. -
Best-effort completion heuristic. The executor polls the pane on an interval, hashing its content, and counts consecutive "stable" polls where the content is unchanged and no "esc to interrupt" activity indicator is present (3 by default). There is no structured result/cost signal from the interactive TUI — ok/fail is inferred from the executor exit code. Runs finish promptly when the agent goes idle: the executor detects the idle
❯prompt (and the absence of a "working" indicator) instead of waiting out the full timeout. -
Decision + transcript logs. The agent records consequential decisions (with reasoning) plus a final
## Summaryin a decision log at<workdir>/.sboxai-decisions.md, alongside the per-run transcript log at<workdir>/.sboxai-last-run.log. Both paths are printed in the run's verdict line:✓ task ok (rc=0) — transcript: <workdir>/.sboxai-last-run.log — decisions: <workdir>/.sboxai-decisions.md -
Worktree per task. With
--worktree <slug> --repo <repoDir>, the run gets an isolated git worktree atworktree/<slug>on branchsboxai/<slug>. -
Task store. Tasks are persisted as JSON files under
.sboxai/tasks/. The daemon drains the queue sequentially; status is derived from the daemon plus live Docker/tmux/filesystem state. -
Exit-code contract.
0ok,2error marker (e.g. rate/usage limit),124timeout,3bad input (missing/empty prompt). -
Credential boundary. Org-integration credentials (Linear/GitHub/Slack) must never enter the sandbox.
host: sboxai CLI
│ docker sandbox exec (env: PROMPT_FILE, TASK_TIMEOUT, CLAUDE_MODEL, TUI_WORKDIR)
▼
in-container: tui-exec.sh (vendored, refreshed when stale)
│ tmux new-session harness-<pid>-<epoch>
▼
claude --dangerously-skip-permissions ← interactive TUI, your subscription
- Phase 1 (done) — lifecycle (
setup/up/shell/stop/rm), auth (login/doctor), and sequential execution (run/status/logs) over live-derived state. - Phase 2 (done) — long-lived daemon + unix-socket IPC + a task queue +
submit/cancel/daemoncommands +rungoes through daemon. - Phase 3 (planned) — bounded-concurrency parallel pool + rate-limit back-pressure.
make help # list all targets
make check # typecheck + lint + test (full gate)
make build # compile a single host binary → dist/sboxaiRepo layout:
src/index.ts commander entry — wires all commands
src/commands/lifecycle.ts setup/up/shell/stop/rm
src/commands/auth.ts login/doctor
src/commands/exec.ts run/submit/cancel/status/logs
src/commands/daemon.ts daemon start/stop/status
src/commands/shared.ts shared command helpers
src/lib/config.ts derivePaths/resolveConfig (pure)
src/lib/docker.ts docker sandbox ls parsing + invocation
src/lib/auth.ts OAuth credential evaluation (pure)
src/lib/worktree.ts slugify + git worktree management
src/lib/exec.ts host-side driver that calls tui-exec.sh
src/lib/task.ts task model + JSON-per-task file store
src/lib/ipc.ts IPC client helpers (HTTP-over-unix-socket)
src/lib/daemon.ts daemon server (HTTP + queue drain loop)
src/lib/__tests__/ bun tests (incl. task.test.ts)
scripts/tui-exec.sh vendored in-container tmux/TUI executor
scripts/sandbox-setup.sh vendored sandbox build/setup script
- The completion heuristic (pane stability + absence of "esc to interrupt") is brittle — it scrapes the TUI rather than reading a structured signal; these are properties of the default
tuimode, whichheadlesssidesteps with a structured result at the separate-billing cost. - Future parallelism is capped (~2–3) by the subscription rate tier, not the tooling.
- Container clock skew can offset transcript timestamps.
MIT — see LICENSE.