Skip to content

CTlanston/claude-code-247

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

343 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

claude-code-247

Your local-first, multi-repo, 24/7 autonomous coding coworker. The Mac stays on; Claude Code authenticated locally drives Docker-isolated workers across every repo in your registry, opens draft PRs on GitHub, runs external validators, scores risk, and merges low-risk changes automatically — gated by your phone if anything bigger.

v1.0.0 GA shipped 2026-05-25 (with an owner-waived 24h soak gate). v2.2.0-rc2 is the current TypeScript production-grade tag for this single-operator system. Per ADR-0013 Path A, no v2.2.0 GA tag is created under the current release policy.

Latest: v2.4.0-patch1 — the core value loop has now run end-to-end on real LLM work for the first time (see the 🆕 v2.4.0-patch1 section just below). The architecture has converged to a TypeScript-only, event-sourced, three-plane design per ADR-0010; the dual-kernel section further down is retained as v1.0.0 history.

Quick Start

The fastest way to try the current product surface is the Operator Cockpit. It starts the TypeScript daemon on 7247 and the web app on 7248.

# 1. Install workspace dependencies
pnpm install

# 2. Start the local daemon + Operator Cockpit web UI
pnpm cockpit:dev

# 3. Open the chat-first cockpit
open http://127.0.0.1:7248

Then run the core operator flow:

  1. Click Start Brainstorm to start a planner conversation.
  2. Answer any clarification cards or add context in the composer.
  3. Click Generate PRD to create PRD / ADR / Roadmap artifacts.
  4. Review the plan in the side panel.
  5. Click Approve only when the plan is acceptable.
  6. Click Start to launch worker execution and watch the execution timeline.
  7. Use the bottom panel for logs, events, validators, token usage, and PR-gate status.

The cockpit defaults to local CLI / subscription-style usage (claude-cli or codex-cli) and must not silently fall back to a paid API. If the planner or worker cannot run, the UI should show a current HOLD with a recovery action rather than fake success.

🆕 v2.4.0-patch1 — real end-to-end loop proven (E2E-Harvest)

The core value loop has now run end-to-end on real LLM work for the first time: a subscription Claude coder running inside Docker writes real code → evidence is collected → two independent validator families (OpenAI + Gemini) judge it on evidence only → a real draft PR is opened on GitHub → token/cost usage is persisted. Architecture decision: ADR-0019.

Current stage: ProductionHardened_v2.4_Ready (see EXECUTION_WORKBOOK.md §0). This is a single-operator system; system.allow_remote_writes defaults to false and gates every outward write — git push, PR creation, and merge alike.

What's new in the technical surface

  • claude-in-Docker runnerpackages/runner/src/claude-docker-runner.ts runs the subscription Claude CLI inside a container against a per-task git worktree, honoring the image's /entrypoint.sh contract (writes /workspace/prompt.txt; sets CLAUDE_ROLE / CLAUDE_MODEL / CLAUDE_PERMISSION_MODE / CLAUDE_ALLOWED_TOOLS; reads back /workspace/result.json). A static + runtime preflight (preflightClaudeDockerEnvironment / preflightRuntime) fails fast with a HOLD-CLAUDE-DOCKER-IMAGE or HOLD-CLAUDE-AUTH-IN-DOCKER reason rather than ever falling back silently to the paid API.

  • runner:e2e1 derived imagepackages/runner/docker/Dockerfile.e2e1

    • entrypoint-e2e1.sh. The patched entrypoint writes /workspace/cli-envelope.json (the raw CLI usage envelope) before normalization, so authoritative token counts survive (the stock result.json reported 0/0).
  • Subscription auth via OAuth token — inject AEDEV_CLAUDE_OAUTH_TOKEN (from claude setup-token) → CLAUDE_CODE_OAUTH_TOKEN inside the container. The macOS keychain credential is host-bound and 401s inside a Linux container, so the token path is the proven, keychain-free option. All ANTHROPIC_* paid-API env vars are stripped from the container.

  • model_usage accounting + live cost rollerinsertModelUsage persists input/output tokens + cost per run and emits a model.usage.recorded event. Local subscription usage is tracked by run count + cost, never reported as $0. The daemon now feeds a long-lived CostRoller (seeded from model_usage on boot so spend survives a restart) and exposes cost_total_usd / cost_per_pr_usd_7d / cost_event_count on /metrics. Only known costs are summed — subscription-unknown stays 0, never fabricated.

  • Dual-family validators — OpenAI- and Gemini-family judges score the evidence package only (never the coder's conversation or chain-of-thought). The merge policy requires two independent families to pass.

  • Structured ClarificationGate (ADR-0020)packages/daemon/src/clarification-gate.ts scores mission ambiguity deterministically (no LLM, no token spend) over four signals; above the threshold (trigger_threshold: 50 in config/policies.yaml) it asks ≤4 questions before any coder runs and writes a verifiable clarified-spec.md. Decision: ADR-0020.

  • Autonomous draft-PR closure — the daemon's mission loop now opens a real draft PR on an AUTO_MERGE decision via DraftPrGate over GhGitRemoteWriter / GhDraftPrCreator (runner plane), instead of stopping at a mock merge. The gate fail-closes on allow_remote_writes, repo.enabled, and forbidden paths, so the no-push default is preserved — with the flag false (default) the loop opens nothing. This folds the proven scripts/e2e1-real-loop.ts path into the loop.

  • Real-diff forbidden-path gate — forbidden-path detection reads the runner's changed-paths.json (the actual git diff file list) rather than regexing evidence prose, and feeds the merge policy's hard BLOCK (mission-runner.ts).

  • /github/sync is gated — the GitHub PR-sync route now fails closed with REMOTE_WRITES_DISABLED unless system.allow_remote_writes is true (it was previously guarded only by the presence of a GitHub token).

Operator Cockpit

Operator Cockpit is the human control plane for the local-first coding coworker. It is intended to feel more like Claude Code Desktop than a passive dashboard: chat first, explicit clarification, visible execution progress, and safety gates that stay obvious.

The current UX v2 surface includes:

  • Chat-first workspace — a left session/history sidebar, main conversation, right mission/artifact panel, and bottom execution/log panel.
  • Structured clarification cards — brainstorm follow-up questions should be answerable through choices and free-form replies, not buried inside markdown.
  • Provider and token transparency — major planner/worker/validator actions expose whether they used claude-cli, codex-cli, mock/test mode, or API-backed validators, plus token/cost data when available.
  • Current-only HOLDs — active blockers are shown prominently, while superseded historical HOLDs remain in logs/events instead of stale top banners.
  • Live execution timeline — after Start, the UI tracks queued, assigned, worker, tests, evidence, validators, PR gate, and blocked/done states in operator-readable language.
  • Safety-preserving PR gate — draft PR creation remains blocked unless system.allow_remote_writes and repo policy explicitly permit outward writes.
  • Repo-bound worker (trust model) — when you pick a repo and press Start, the worker executes inside an isolated git worktree of that repo (checked out at the committed HEAD, so your working tree and branches are untouched), never an empty scratch directory. If the selected repo is missing, disabled, or not a git repository, the mission HOLDs (HOLD-TARGET-REPO-UNAVAILABLE) rather than writing throwaway files and reporting "done". Evidence records the real changed-paths.json, repo path, and worktree path; touching a forbidden path (.env*, secrets/**, .github/**, AGENTS.md, CLAUDE.md) blocks the merge gate.

For the detailed UX v2 implementation brief, see docs/handoff/operator-cockpit-ux-v2-prd-2026-05-31.md.

Running the E2E loop

# 0. One-time: capture a keychain-free subscription token
claude setup-token            # store the sk-ant-oat... value where your secrets live

# 1. Build the runner:e2e1 image (authoritative token counts)
docker build -f packages/runner/docker/Dockerfile.e2e1 \
  -t claude-code-247/runner:e2e1 packages/runner/docker

# 2. Real end-to-end loop: docker Claude coder → dual-family → draft PR → model_usage
#    (draft-only; never merges. Needs the OAuth token + OPENAI/GEMINI keys in env.)
node_modules/.bin/tsx scripts/e2e1-real-loop.ts

# 3. ClarificationGate shadow walk (deterministic; spends no LLM tokens)
node_modules/.bin/tsx scripts/e2e2-clarification-shadow-walk.ts

Safety model: these scripts pass allowRemoteWrites: true in-process to a draft-only PR gate; the global system.allow_remote_writes stays false. Because they pre-approve the mission, they deliberately bypass the daemon's approval path — so no ntfy phone approval is requested. To exercise the real approval flow (medium/high-risk merge, API fallback, etc.), run a mission through the daemon's IntakeService, which pushes an ntfy notification to your phone for approve/reject.

⚡ Architecture today (v1.0.0) — dual kernel, single product

The dual-kernel layout below is the current state as of v1.0.0 GA. v2.0 collapses it to a single TypeScript control plane and removes the Python tree entirely. See V2_ARCHITECTURE.md for the target architecture and the stage-by-stage plan.

claude-code-247 is one product OS with two cooperating kernels:

Layer Implementation Role
Control plane TypeScript aedev (pnpm monorepo) Primary CLI, daemon, dashboard, state machine, mission intake, roadmap, task graph, approvals, memory, risk, preview/deploy orchestration, evidence bundle.
Execution kernel Python claude247 (v1.0.0 GA) Mature Docker worker runtime, headless claude --print invocation, Gemini + OpenAI judges, GitHub PR creation. Invoked by aedev during the parity window.
Bridge @aedev/claude247-bridge Enqueues tasks into the Python state DB, polls status, imports evidence back into aedev's SQLite.

This dual-kernel design is recorded in ADR-0009, which supersedes ADR-0008. aedev is the primary entry point for new product-OS work; the Python kernel continues to drive worker execution and validator orchestration until the TypeScript runtime reaches parity (see docs/aedev-prototype-status.md for the parity gate list). Both ADRs will be superseded by ADR-0010 in Stage A of the v2.0 plan.

What you get

  • Multi-repo from day one. One registry, many repos. Per-repo budget, risk policy, allowed/forbidden paths.
  • Local-first execution. Mac + Docker. Your authenticated Claude Code session is the default; the paid API is opt-in.
  • Mobile control. claude247 status --plain and claude247 status-board --plain are built for SMS-sized output. ntfy.sh pushes for approvals and stuck tasks.
  • External validator isolation. Gemini 2.5 Pro and an OpenAI-compatible judge see only the evidence package — never the Coder's conversation.
  • Low-risk auto-merge with score 0–100; medium asks your phone, high blocks.
  • Long-term memory that compiles failures, lessons, and decisions back into per-repo .agent/*.md files.
  • Failure replay for any task.
  • Live read-only watchdog dashboard (new in v1.0.0 / M22b) — see below.

Legacy / CLI Quick Start

aedev is the primary control plane. The Python claude247 kernel is installed alongside it during the parity window and handles worker execution underneath.

# 1. Install the Python execution kernel (mature, GA v1.0.0)
make install                    # creates venv + installs deps + launchd plists
claude247 doctor                # verify kernel environment

# 2. Install the TypeScript control plane
pnpm install
pnpm -r build

# 3. Initialize aedev home (~/.aedev/)
aedev init

# 4. Start the aedev daemon (port 7247) — control plane + dashboard
aedev daemon start
open http://localhost:7247

# 5. Submit a mission via the control plane (two-step approval)
aedev intake "refactor the auth middleware in repo my-repo"
aedev mission list              # find the mission id
aedev mission approve <id>      # explicit approval — no self-approve

# 6. Inspect status / tasks via the control plane
aedev status --plain
aedev task list

# 7. Read-only watchdog (Python kernel) — phone-friendly
claude247 status-board --plain
claude247 watchdog --plain
claude247 status-board --json
claude247 status-board --write-md M22_WATCHDOG_DASHBOARD.md

During the parity window, some kernel-level operations are still invoked directly via claude247 (worker launch, validator orchestration, GitHub PR creation). The @aedev/claude247-bridge package routes aedev missions through the Python kernel automatically — see ADR-0009 and docs/aedev-prototype-status.md.

Live watchdog dashboard

A read-only operations dashboard for "is the 24/7 daemon actually OK right now?" Designed to be safe to run from a phone while the dispatcher is mid-tick — the SQL is SELECT-only and the contract is asserted by a regression test (tests/unit/test_status_board.py::test_read_only_does_not_mutate_db).

Web (Apple-style): http://127.0.0.1:8423/status-board

  • Activity-ring soak progress (recolors green / blue / red by state) using only inline SVG + CSS — no charting library
  • Auto-refresh every 15s (configurable 5 / 15 / 30 / 60s / off); fetches /status-board.json, updates DOM in place, briefly tints cards that changed — no full reload, no flicker
  • EN ↔ 中文 language toggle with localStorage persistence
  • Dark mode follows prefers-color-scheme
  • Live indicator dot in the top bar — pulsing green when live, amber when paused, red when a fetch fails
  • Pause / resume / refresh-now controls with a morphing play/pause SVG button
  • Zero external dependencies — no CDN, no font files, no JS library; the whole page is ~25KB inline

CLI:

claude247 status-board --plain
# Claude247 Watchdog Dashboard
# Generated: 2026-05-25T...
#
# Release State / Soak Progress / Runtime Health
# Queue / Task State / Recent Signals / GA Gates / Usage

JSON: http://127.0.0.1:8423/status-board.json

{
  "generated_at": "...",
  "release_state": { "main_sha": "...", "ga_status": "..." },
  "soak":          { "t0": "...", "progress_percent": 38, "result": "PARTIAL" },
  "runtime_health":{ "launchd_loaded": 4, "dispatcher": "healthy", ... },
  "queue":         { "active_tasks": 0, "orphan_commands": 0, ... },
  "signals":       { "new_critical_errors": 0, "alert_storm": false, ... },
  "ga_gates":      { "passed": 18, "total": 19, "recommendation": "..." },
  "usage":         { "runs_total": 0, "active_workers": 0, ... }
}

The watchdog reads M20_SOAK_RESULT.md to auto-discover the dispatcher T0; pass --t0 2026-05-24T21:46Z to override.

Status

  • v1.0.0 GA — released 2026-05-25 (Python claude247 kernel).
  • The first GA release. See RELEASE_NOTES_GA.md for the full notes, GA_GATE.md for the 19-gate GA contract, and M22_GA_DECISION_REPORT.md for the GA decision record.
  • Soak gate was explicitly waived by the owner after ~9h 12m of healthy soak evidence (4/4 launchd loaded, ~1182 dispatcher idle ticks, backup completed, 0 alerts, 0 orphan commands, $0 Anthropic worker spend). Final T+24h observation is a post-GA follow-up — the watchdog dashboard will auto-flip soak.result to PASS or FAIL once wall-clock crosses 2026-05-25T21:46Z.
  • Pre-release history (alpha.0beta.2) preserved on GitHub.
  • v2.2.0-rc2 is production grade for the TypeScript line — single TypeScript daemon, Python tree removed, HOLD as first-class state, closed-loop approval (ntfy/Tailscale), push-time security gate, resumable moves, cross-platform supervisor, chaos drills, Agent Mesh, RoadmapAgent, and Sentinel. The formal policy is docs/operations/release-policy.md.
  • No v2.1.0 or v2.2.0 GA tag is expected under the current policy. The expected v2 release references are v2.1.0-rc1, v2.1.0-rc2, v2.2.0-rc1, and v2.2.0-rc2.

Documentation

v2 TypeScript line:

v1.0.0 (current GA):

  • RELEASE_NOTES_GA.md — v1.0.0 release notes
  • GA_GATE.md — 19-gate GA contract + owner-waiver policy
  • M22_GA_DECISION_REPORT.md — GA decision record
  • M20_SOAK_RESULT.md — soak observation + waiver record
  • DEFINITION_OF_DONE.md — DoD checklist
  • CHANGELOG.md — release history
  • docs/ARCHITECTURE.md — module map and data flow (v1.0.0)
  • docs/INSTALL.md — full install + uninstall + doctor
  • docs/REMOTE_DISPATCH.md — phone / Remote / Dispatch operating guide
  • docs/SECURITY.md — secret hygiene, forbidden paths, approval flow
  • docs/MEMORY.md — vector + .agent file architecture
  • docs/AUTO_MERGE_POLICY.md — risk scoring and merge gates
  • docs/VALIDATORS.md — Gemini + OpenAI judge contracts
  • docs/REPO_ONBOARDING.md — adding repos
  • docs/OPERATIONS.md — day-to-day operating playbook

Working on the TypeScript control plane (aedev)

# Install dependencies (Node.js ≥ 20, pnpm ≥ 10 required)
pnpm install

# Run all tests
pnpm test

# Type-check across the workspace
pnpm typecheck

# Lint
pnpm lint

# Opt-in real subprocess smoke tests (require `claude` and/or Docker on PATH)
AEDEV_SMOKE_CLAUDE=1 pnpm test --filter @aedev/runner
AEDEV_SMOKE_DOCKER=1 pnpm test --filter @aedev/runner

# Start the daemon (port 7247) — serves the dashboard + REST API
cd packages/daemon && pnpm start
open http://localhost:7247

Architecture decisions for aedev: docs/adr/ (ADR-0001 through ADR-0009).

TS runtime parity gates: docs/aedev-prototype-status.md.


License

Internal.

About

No description, website, or topics provided.

Resources

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors