Your local-first, multi-repo, 24/7 autonomous coding coworker. The Mac stays on; Claude Code authenticated locally drives Docker-isolated workers across every repo in your registry, opens draft PRs on GitHub, runs external validators, scores risk, and merges low-risk changes automatically — gated by your phone if anything bigger.
v1.0.0 GA shipped 2026-05-25 (with an owner-waived 24h soak gate). v2.2.0-rc2 is the current TypeScript production-grade tag for this single-operator system. Per ADR-0013 Path A, no
v2.2.0GA tag is created under the current release policy.Latest:
v2.4.0-patch1— the core value loop has now run end-to-end on real LLM work for the first time (see the 🆕 v2.4.0-patch1 section just below). The architecture has converged to a TypeScript-only, event-sourced, three-plane design per ADR-0010; the dual-kernel section further down is retained as v1.0.0 history.
The fastest way to try the current product surface is the Operator Cockpit.
It starts the TypeScript daemon on 7247 and the web app on 7248.
# 1. Install workspace dependencies
pnpm install
# 2. Start the local daemon + Operator Cockpit web UI
pnpm cockpit:dev
# 3. Open the chat-first cockpit
open http://127.0.0.1:7248Then run the core operator flow:
- Click
Start Brainstormto start a planner conversation. - Answer any clarification cards or add context in the composer.
- Click
Generate PRDto create PRD / ADR / Roadmap artifacts. - Review the plan in the side panel.
- Click
Approveonly when the plan is acceptable. - Click
Startto launch worker execution and watch the execution timeline. - Use the bottom panel for logs, events, validators, token usage, and PR-gate status.
The cockpit defaults to local CLI / subscription-style usage (claude-cli or
codex-cli) and must not silently fall back to a paid API. If the planner or
worker cannot run, the UI should show a current HOLD with a recovery action
rather than fake success.
The core value loop has now run end-to-end on real LLM work for the first time: a subscription Claude coder running inside Docker writes real code → evidence is collected → two independent validator families (OpenAI + Gemini) judge it on evidence only → a real draft PR is opened on GitHub → token/cost usage is persisted. Architecture decision: ADR-0019.
Current stage:
ProductionHardened_v2.4_Ready(see EXECUTION_WORKBOOK.md §0). This is a single-operator system;system.allow_remote_writesdefaults tofalseand gates every outward write —git push, PR creation, and merge alike.
-
claude-in-Docker runner —
packages/runner/src/claude-docker-runner.tsruns the subscription Claude CLI inside a container against a per-task git worktree, honoring the image's/entrypoint.shcontract (writes/workspace/prompt.txt; setsCLAUDE_ROLE/CLAUDE_MODEL/CLAUDE_PERMISSION_MODE/CLAUDE_ALLOWED_TOOLS; reads back/workspace/result.json). A static + runtime preflight (preflightClaudeDockerEnvironment/preflightRuntime) fails fast with aHOLD-CLAUDE-DOCKER-IMAGEorHOLD-CLAUDE-AUTH-IN-DOCKERreason rather than ever falling back silently to the paid API. -
runner:e2e1derived image —packages/runner/docker/Dockerfile.e2e1entrypoint-e2e1.sh. The patched entrypoint writes/workspace/cli-envelope.json(the raw CLI usage envelope) before normalization, so authoritative token counts survive (the stockresult.jsonreported0/0).
-
Subscription auth via OAuth token — inject
AEDEV_CLAUDE_OAUTH_TOKEN(fromclaude setup-token) →CLAUDE_CODE_OAUTH_TOKENinside the container. The macOS keychain credential is host-bound and 401s inside a Linux container, so the token path is the proven, keychain-free option. AllANTHROPIC_*paid-API env vars are stripped from the container. -
model_usageaccounting + live cost roller —insertModelUsagepersists input/output tokens + cost per run and emits amodel.usage.recordedevent. Local subscription usage is tracked by run count + cost, never reported as$0. The daemon now feeds a long-livedCostRoller(seeded frommodel_usageon boot so spend survives a restart) and exposescost_total_usd/cost_per_pr_usd_7d/cost_event_counton/metrics. Only known costs are summed — subscription-unknown stays 0, never fabricated. -
Dual-family validators — OpenAI- and Gemini-family judges score the evidence package only (never the coder's conversation or chain-of-thought). The merge policy requires two independent families to pass.
-
Structured ClarificationGate (ADR-0020) —
packages/daemon/src/clarification-gate.tsscores mission ambiguity deterministically (no LLM, no token spend) over four signals; above the threshold (trigger_threshold: 50inconfig/policies.yaml) it asks ≤4 questions before any coder runs and writes a verifiableclarified-spec.md. Decision: ADR-0020. -
Autonomous draft-PR closure — the daemon's mission loop now opens a real draft PR on an
AUTO_MERGEdecision viaDraftPrGateoverGhGitRemoteWriter/GhDraftPrCreator(runner plane), instead of stopping at a mock merge. The gate fail-closes onallow_remote_writes,repo.enabled, and forbidden paths, so the no-push default is preserved — with the flagfalse(default) the loop opens nothing. This folds the provenscripts/e2e1-real-loop.tspath into the loop. -
Real-diff forbidden-path gate — forbidden-path detection reads the runner's
changed-paths.json(the actualgit difffile list) rather than regexing evidence prose, and feeds the merge policy's hard BLOCK (mission-runner.ts). -
/github/syncis gated — the GitHub PR-sync route now fails closed withREMOTE_WRITES_DISABLEDunlesssystem.allow_remote_writesis true (it was previously guarded only by the presence of a GitHub token).
Operator Cockpit is the human control plane for the local-first coding coworker. It is intended to feel more like Claude Code Desktop than a passive dashboard: chat first, explicit clarification, visible execution progress, and safety gates that stay obvious.
The current UX v2 surface includes:
- Chat-first workspace — a left session/history sidebar, main conversation, right mission/artifact panel, and bottom execution/log panel.
- Structured clarification cards — brainstorm follow-up questions should be answerable through choices and free-form replies, not buried inside markdown.
- Provider and token transparency — major planner/worker/validator actions
expose whether they used
claude-cli,codex-cli, mock/test mode, or API-backed validators, plus token/cost data when available. - Current-only HOLDs — active blockers are shown prominently, while superseded historical HOLDs remain in logs/events instead of stale top banners.
- Live execution timeline — after
Start, the UI tracks queued, assigned, worker, tests, evidence, validators, PR gate, and blocked/done states in operator-readable language. - Safety-preserving PR gate — draft PR creation remains blocked unless
system.allow_remote_writesand repo policy explicitly permit outward writes. - Repo-bound worker (trust model) — when you pick a repo and press
Start, the worker executes inside an isolatedgit worktreeof that repo (checked out at the committedHEAD, so your working tree and branches are untouched), never an empty scratch directory. If the selected repo is missing, disabled, or not a git repository, the mission HOLDs (HOLD-TARGET-REPO-UNAVAILABLE) rather than writing throwaway files and reporting "done". Evidence records the realchanged-paths.json, repo path, and worktree path; touching a forbidden path (.env*,secrets/**,.github/**,AGENTS.md,CLAUDE.md) blocks the merge gate.
For the detailed UX v2 implementation brief, see
docs/handoff/operator-cockpit-ux-v2-prd-2026-05-31.md.
# 0. One-time: capture a keychain-free subscription token
claude setup-token # store the sk-ant-oat... value where your secrets live
# 1. Build the runner:e2e1 image (authoritative token counts)
docker build -f packages/runner/docker/Dockerfile.e2e1 \
-t claude-code-247/runner:e2e1 packages/runner/docker
# 2. Real end-to-end loop: docker Claude coder → dual-family → draft PR → model_usage
# (draft-only; never merges. Needs the OAuth token + OPENAI/GEMINI keys in env.)
node_modules/.bin/tsx scripts/e2e1-real-loop.ts
# 3. ClarificationGate shadow walk (deterministic; spends no LLM tokens)
node_modules/.bin/tsx scripts/e2e2-clarification-shadow-walk.tsSafety model: these scripts pass
allowRemoteWrites: truein-process to a draft-only PR gate; the globalsystem.allow_remote_writesstaysfalse. Because they pre-approve the mission, they deliberately bypass the daemon's approval path — so no ntfy phone approval is requested. To exercise the real approval flow (medium/high-risk merge, API fallback, etc.), run a mission through the daemon'sIntakeService, which pushes an ntfy notification to your phone for approve/reject.
The dual-kernel layout below is the current state as of v1.0.0 GA. v2.0 collapses it to a single TypeScript control plane and removes the Python tree entirely. See V2_ARCHITECTURE.md for the target architecture and the stage-by-stage plan.
claude-code-247 is one product OS with two cooperating kernels:
| Layer | Implementation | Role |
|---|---|---|
| Control plane | TypeScript aedev (pnpm monorepo) |
Primary CLI, daemon, dashboard, state machine, mission intake, roadmap, task graph, approvals, memory, risk, preview/deploy orchestration, evidence bundle. |
| Execution kernel | Python claude247 (v1.0.0 GA) |
Mature Docker worker runtime, headless claude --print invocation, Gemini + OpenAI judges, GitHub PR creation. Invoked by aedev during the parity window. |
| Bridge | @aedev/claude247-bridge |
Enqueues tasks into the Python state DB, polls status, imports evidence back into aedev's SQLite. |
This dual-kernel design is recorded in
ADR-0009, which supersedes
ADR-0008.
aedev is the primary entry point for new product-OS work; the Python kernel
continues to drive worker execution and validator orchestration until the
TypeScript runtime reaches parity (see
docs/aedev-prototype-status.md for the
parity gate list). Both ADRs will be superseded by ADR-0010 in Stage A of
the v2.0 plan.
- Multi-repo from day one. One registry, many repos. Per-repo budget, risk policy, allowed/forbidden paths.
- Local-first execution. Mac + Docker. Your authenticated Claude Code session is the default; the paid API is opt-in.
- Mobile control.
claude247 status --plainandclaude247 status-board --plainare built for SMS-sized output. ntfy.sh pushes for approvals and stuck tasks. - External validator isolation. Gemini 2.5 Pro and an OpenAI-compatible judge see only the evidence package — never the Coder's conversation.
- Low-risk auto-merge with score 0–100; medium asks your phone, high blocks.
- Long-term memory that compiles failures, lessons, and decisions
back into per-repo
.agent/*.mdfiles. - Failure replay for any task.
- Live read-only watchdog dashboard (new in v1.0.0 / M22b) — see below.
aedev is the primary control plane. The Python claude247 kernel is
installed alongside it during the parity window and handles worker execution
underneath.
# 1. Install the Python execution kernel (mature, GA v1.0.0)
make install # creates venv + installs deps + launchd plists
claude247 doctor # verify kernel environment
# 2. Install the TypeScript control plane
pnpm install
pnpm -r build
# 3. Initialize aedev home (~/.aedev/)
aedev init
# 4. Start the aedev daemon (port 7247) — control plane + dashboard
aedev daemon start
open http://localhost:7247
# 5. Submit a mission via the control plane (two-step approval)
aedev intake "refactor the auth middleware in repo my-repo"
aedev mission list # find the mission id
aedev mission approve <id> # explicit approval — no self-approve
# 6. Inspect status / tasks via the control plane
aedev status --plain
aedev task list
# 7. Read-only watchdog (Python kernel) — phone-friendly
claude247 status-board --plain
claude247 watchdog --plain
claude247 status-board --json
claude247 status-board --write-md M22_WATCHDOG_DASHBOARD.mdDuring the parity window, some kernel-level operations are still invoked
directly via claude247 (worker launch, validator orchestration, GitHub
PR creation). The @aedev/claude247-bridge package routes aedev missions
through the Python kernel automatically — see
ADR-0009 and
docs/aedev-prototype-status.md.
A read-only operations dashboard for "is the 24/7 daemon actually OK
right now?" Designed to be safe to run from a phone while the
dispatcher is mid-tick — the SQL is SELECT-only and the contract is
asserted by a regression test
(tests/unit/test_status_board.py::test_read_only_does_not_mutate_db).
Web (Apple-style): http://127.0.0.1:8423/status-board
- Activity-ring soak progress (recolors green / blue / red by state) using only inline SVG + CSS — no charting library
- Auto-refresh every 15s (configurable 5 / 15 / 30 / 60s / off);
fetches
/status-board.json, updates DOM in place, briefly tints cards that changed — no full reload, no flicker - EN ↔ 中文 language toggle with
localStoragepersistence - Dark mode follows
prefers-color-scheme - Live indicator dot in the top bar — pulsing green when live, amber when paused, red when a fetch fails
- Pause / resume / refresh-now controls with a morphing play/pause SVG button
- Zero external dependencies — no CDN, no font files, no JS library; the whole page is ~25KB inline
CLI:
claude247 status-board --plain
# Claude247 Watchdog Dashboard
# Generated: 2026-05-25T...
#
# Release State / Soak Progress / Runtime Health
# Queue / Task State / Recent Signals / GA Gates / UsageJSON: http://127.0.0.1:8423/status-board.json
{
"generated_at": "...",
"release_state": { "main_sha": "...", "ga_status": "..." },
"soak": { "t0": "...", "progress_percent": 38, "result": "PARTIAL" },
"runtime_health":{ "launchd_loaded": 4, "dispatcher": "healthy", ... },
"queue": { "active_tasks": 0, "orphan_commands": 0, ... },
"signals": { "new_critical_errors": 0, "alert_storm": false, ... },
"ga_gates": { "passed": 18, "total": 19, "recommendation": "..." },
"usage": { "runs_total": 0, "active_workers": 0, ... }
}The watchdog reads M20_SOAK_RESULT.md to auto-discover the
dispatcher T0; pass --t0 2026-05-24T21:46Z to override.
v1.0.0GA — released 2026-05-25 (Pythonclaude247kernel).- The first GA release. See RELEASE_NOTES_GA.md for the full notes, GA_GATE.md for the 19-gate GA contract, and M22_GA_DECISION_REPORT.md for the GA decision record.
- Soak gate was explicitly waived by the owner after ~9h 12m of
healthy soak evidence (4/4 launchd loaded, ~1182 dispatcher idle
ticks, backup completed, 0 alerts, 0 orphan commands, $0 Anthropic
worker spend). Final T+24h observation is a post-GA follow-up —
the watchdog dashboard will auto-flip
soak.resulttoPASSorFAILonce wall-clock crosses2026-05-25T21:46Z. - Pre-release history (
alpha.0→beta.2) preserved on GitHub. v2.2.0-rc2is production grade for the TypeScript line — single TypeScript daemon, Python tree removed, HOLD as first-class state, closed-loop approval (ntfy/Tailscale), push-time security gate, resumable moves, cross-platform supervisor, chaos drills, Agent Mesh, RoadmapAgent, and Sentinel. The formal policy is docs/operations/release-policy.md.- No
v2.1.0orv2.2.0GA tag is expected under the current policy. The expected v2 release references arev2.1.0-rc1,v2.1.0-rc2,v2.2.0-rc1, andv2.2.0-rc2.
v2 TypeScript line:
- V2_ARCHITECTURE.md — full v2.0 architecture and stage-by-stage implementation plan (start here)
- docs/operations/release-policy.md — current release-grade and tag policy
v1.0.0 (current GA):
- RELEASE_NOTES_GA.md — v1.0.0 release notes
- GA_GATE.md — 19-gate GA contract + owner-waiver policy
- M22_GA_DECISION_REPORT.md — GA decision record
- M20_SOAK_RESULT.md — soak observation + waiver record
- DEFINITION_OF_DONE.md — DoD checklist
- CHANGELOG.md — release history
docs/ARCHITECTURE.md— module map and data flow (v1.0.0)docs/INSTALL.md— full install + uninstall + doctordocs/REMOTE_DISPATCH.md— phone / Remote / Dispatch operating guidedocs/SECURITY.md— secret hygiene, forbidden paths, approval flowdocs/MEMORY.md— vector + .agent file architecturedocs/AUTO_MERGE_POLICY.md— risk scoring and merge gatesdocs/VALIDATORS.md— Gemini + OpenAI judge contractsdocs/REPO_ONBOARDING.md— adding reposdocs/OPERATIONS.md— day-to-day operating playbook
# Install dependencies (Node.js ≥ 20, pnpm ≥ 10 required)
pnpm install
# Run all tests
pnpm test
# Type-check across the workspace
pnpm typecheck
# Lint
pnpm lint
# Opt-in real subprocess smoke tests (require `claude` and/or Docker on PATH)
AEDEV_SMOKE_CLAUDE=1 pnpm test --filter @aedev/runner
AEDEV_SMOKE_DOCKER=1 pnpm test --filter @aedev/runner
# Start the daemon (port 7247) — serves the dashboard + REST API
cd packages/daemon && pnpm start
open http://localhost:7247Architecture decisions for aedev: docs/adr/ (ADR-0001 through ADR-0009).
TS runtime parity gates: docs/aedev-prototype-status.md.
Internal.