Skip to content

ethosagent/sboxai

Repository files navigation

sboxai

Front-door CLI for orchestrating a Docker sandbox and driving autonomous claude/codex coding tasks inside it — defaults to the interactive Claude/Codex TUI on your subscription, with an opt-in headless mode.

What & why

By default sboxai drives the interactive Claude TUI (claude --dangerously-skip-permissions) over tmux inside the sandbox, drawing your normal Claude subscription allowance — cost-free beyond the subscription. An opt-in headless mode (claude -p) is also available for a clean structured result, but it draws the separate monthly Agent-SDK credit pool ($20 Pro / $100 Max5x / $200 Max20x as of June 2026), not your interactive subscription — that billing difference is why it's opt-in. See Execution modes for details.

A lightweight daemon (Phase 2) manages the task queue over a unix socket. Status is derived from the daemon plus live state (Docker, tmux, the filesystem).

Prerequisites

  • Docker Desktop with the docker sandbox plugin.
  • Bun (for dev and building from source).
  • A manual OAuth login for claude/codex performed inside the sandbox via sboxai login — it cannot be scripted.

Install

Build a single self-contained binary for your host:

make build      # → dist/sboxai

Or download a release binary from GitHub Releases. Binaries are published per platform:

Binary Platform
sboxai-darwin-arm64 macOS Apple Silicon
sboxai-darwin-x64 macOS Intel
sboxai-linux-x64 Linux x86_64
sboxai-linux-arm64 Linux ARM64

Verify a download against the published checksums:

sha256sum -c SHA256SUMS                      # verify all binaries present
sha256sum -c --ignore-missing SHA256SUMS     # verify just the one you downloaded

Quickstart

# 1. Build/refresh the sandbox (clones the project to work on)
sboxai setup --root ~/sandbox --repo-url https://github.com/you/project.git

# 2. Authenticate claude/codex inside the sandbox (manual OAuth)
sboxai login --root ~/sandbox

# 3. Confirm the plugin, a running sandbox, and valid Claude auth
sboxai doctor --root ~/sandbox

# 4. Drive a task to completion
sboxai run "fix the failing parser test" --root ~/sandbox

Set SBOXAI_ROOT once and subsequent commands can omit --root:

export SBOXAI_ROOT=~/sandbox
sboxai doctor
sboxai run "fix the failing parser test"

Command reference

--root may be supplied via the SBOXAI_ROOT env var; --name via SBOXAI_SANDBOX (default dev).

Lifecycle

Command Description Flags
setup Build/refresh the Docker sandbox via the vendored setup script --root <dir>, --name <name>, --repo-url <url>, --repo-branch <branch> (default main)
up Ensure the sandbox exists and is running (wakes a stopped one) --root <dir>, --name <name>
shell Open an interactive shell inside the sandbox --root <dir>, --name <name>
stop Stop the sandbox --root <dir>, --name <name>
rm Remove the sandbox (host .auth/ survives) --root <dir>, --name <name>, --yes

Auth

Command Description Flags
login Drop into the sandbox shell to run claude login / codex login --root <dir>, --name <name>
doctor Read-only health checklist (exits nonzero if a critical check fails) --root <dir>, --name <name>

Execution

Command Description Flags
run <task> Run one autonomous task in the sandbox (sequential, synchronous) --root, --name, --workdir, --worktree, --repo, --model, --timeout, --mode tui|headless
submit <task> Enqueue a task for daemon execution (async) --root, --name, --workdir, --worktree, --repo, --model, --timeout, --mode tui|headless
mode [value] Show the effective execution mode, or set it (tui | headless) — persisted to .sboxai/config.json --root, --name
cancel <id> Cancel a queued or running task --root, --name
status Live status: sandbox, daemon, tasks, worktrees --root, --name
logs <slug> Print (or follow) the last-run transcript log for a slug --root, --name, -f, --follow, --workdir

Execution modes (tui | headless)

Two execution modes; the default is tui.

  • tui (default) — drives the interactive Claude Code TUI inside the sandbox (via the vendored tui-exec.sh over tmux). Runs under your Claude subscription, so it's cost-free beyond the subscription. Completion is inferred from the TUI going idle.
  • headless — runs claude -p --output-format text --dangerously-skip-permissions with the prompt piped on STDIN. Gives a clean exit code / result with no TUI screen-scraping, but draws the separate monthly Agent-SDK credit pool ($20 Pro / $100 Max5x / $200 Max20x), not your interactive subscription — hence opt-in.

Switch modes:

sboxai mode                       # show the effective mode and where it came from
sboxai mode headless              # set + persist to .sboxai/config.json
sboxai mode tui                   # switch back
sboxai run --mode headless ""    # per-run override (also on `submit`)

Precedence: --mode flag → SBOXAI_MODE env → .sboxai/config.json → default tui.

Autonomous execution

Every run's prompt is prefixed with a contract instructing the agent to run fully unattended — never ask questions, never block on a decision, make its own judgment calls and proceed. There is no human-reply path by design; ambiguity is resolved by the agent, not surfaced to a human.

Daemon

Command Description Flags
daemon start Start the background task daemon --root, --name
daemon stop Stop the running daemon --root, --name
daemon status Check if daemon is running --root, --name

How it works

  • Interactive TUI via tmux. A task launches claude --dangerously-skip-permissions [--model X] inside a detached tmux session (harness-<pid>-<epoch>), waits for the TUI to be ready, and injects the prompt via bracketed paste so multi-line text isn't submitted line-by-line.

  • Best-effort completion heuristic. The executor polls the pane on an interval, hashing its content, and counts consecutive "stable" polls where the content is unchanged and no "esc to interrupt" activity indicator is present (3 by default). There is no structured result/cost signal from the interactive TUI — ok/fail is inferred from the executor exit code. Runs finish promptly when the agent goes idle: the executor detects the idle prompt (and the absence of a "working" indicator) instead of waiting out the full timeout.

  • Decision + transcript logs. The agent records consequential decisions (with reasoning) plus a final ## Summary in a decision log at <workdir>/.sboxai-decisions.md, alongside the per-run transcript log at <workdir>/.sboxai-last-run.log. Both paths are printed in the run's verdict line:

    ✓ task ok (rc=0) — transcript: <workdir>/.sboxai-last-run.log — decisions: <workdir>/.sboxai-decisions.md
    
  • Worktree per task. With --worktree <slug> --repo <repoDir>, the run gets an isolated git worktree at worktree/<slug> on branch sboxai/<slug>.

  • Task store. Tasks are persisted as JSON files under .sboxai/tasks/. The daemon drains the queue sequentially; status is derived from the daemon plus live Docker/tmux/filesystem state.

  • Exit-code contract. 0 ok, 2 error marker (e.g. rate/usage limit), 124 timeout, 3 bad input (missing/empty prompt).

  • Credential boundary. Org-integration credentials (Linear/GitHub/Slack) must never enter the sandbox.

Architecture

host: sboxai CLI
        │  docker sandbox exec  (env: PROMPT_FILE, TASK_TIMEOUT, CLAUDE_MODEL, TUI_WORKDIR)
        ▼
in-container: tui-exec.sh  (vendored, refreshed when stale)
        │  tmux new-session  harness-<pid>-<epoch>
        ▼
        claude --dangerously-skip-permissions   ← interactive TUI, your subscription

Roadmap

  • Phase 1 (done) — lifecycle (setup/up/shell/stop/rm), auth (login/doctor), and sequential execution (run/status/logs) over live-derived state.
  • Phase 2 (done) — long-lived daemon + unix-socket IPC + a task queue + submit/cancel/daemon commands + run goes through daemon.
  • Phase 3 (planned) — bounded-concurrency parallel pool + rate-limit back-pressure.

Development

make help     # list all targets
make check    # typecheck + lint + test (full gate)
make build    # compile a single host binary → dist/sboxai

Repo layout:

src/index.ts               commander entry — wires all commands
src/commands/lifecycle.ts  setup/up/shell/stop/rm
src/commands/auth.ts       login/doctor
src/commands/exec.ts       run/submit/cancel/status/logs
src/commands/daemon.ts     daemon start/stop/status
src/commands/shared.ts     shared command helpers
src/lib/config.ts          derivePaths/resolveConfig (pure)
src/lib/docker.ts          docker sandbox ls parsing + invocation
src/lib/auth.ts            OAuth credential evaluation (pure)
src/lib/worktree.ts        slugify + git worktree management
src/lib/exec.ts            host-side driver that calls tui-exec.sh
src/lib/task.ts            task model + JSON-per-task file store
src/lib/ipc.ts             IPC client helpers (HTTP-over-unix-socket)
src/lib/daemon.ts          daemon server (HTTP + queue drain loop)
src/lib/__tests__/         bun tests (incl. task.test.ts)
scripts/tui-exec.sh        vendored in-container tmux/TUI executor
scripts/sandbox-setup.sh   vendored sandbox build/setup script

Caveats

  • The completion heuristic (pane stability + absence of "esc to interrupt") is brittle — it scrapes the TUI rather than reading a structured signal; these are properties of the default tui mode, which headless sidesteps with a structured result at the separate-billing cost.
  • Future parallelism is capped (~2–3) by the subscription rate tier, not the tooling.
  • Container clock skew can offset transcript timestamps.

License

MIT — see LICENSE.

About

A Bun + TypeScript CLI that manages a Docker sandbox and drives autonomous Claude/Codex coding agents inside it — through the interactive TUI (your subscription, not the API), with git-worktree isolation and no local state.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors