Harness

OTP-native task-execution engine an AI orchestrator drives end to end.

Harness pulls tasks from an rmap roadmap, dispatches each to a headless coding agent (Claude Code, Cursor, Codex, Grok, Antigravity, Pi) running in an isolated git worktree, runs the target project's own check stack against the result, and reports a verified outcome. The primary user is an AI orchestrator, not a human. The verification stack — not the agent's self-report — is the source of truth for success/failure. Every adapter is held to the same AgentAdapter behaviour and a reusable conformance suite.

Status

Post-v0_5: harness is a long-running multi-project OTP node. Harness.ProjectRegistry holds N first-class projects (Elixir, Rust, anything with a shell-driven check stack); Oban (queue-per-project, Postgres-persisted) provides dispatch with restart resilience; six agent adapters (Claude Code, Codex, Cursor, Grok, Antigravity, Pi) drive runs; the Harness.Run gen_statem owns the per-run lifecycle and an autonomous repair loop; Oban.Plugins.Cron lets the roadmap drive itself unattended.

The cold-path consumer surface is the Phoenix LiveView dashboard + embedded Oban Web + a native MCP server (/harness/mcp, flat JSON-RPC tools) + a Tidewave MCP plug (/tidewave/mcp, project_eval), all served by one standalone Bandit endpoint on http://localhost:4018. The native MCP tools (dispatch__task, dispatch__status, dispatch__verdict_detail, roadmap__*, …) are the primary surface for any JSON/MCP orchestrator; Tidewave project_eval + IEx are the escape hatch for arbitrary eval and the struct-passing ops the flat tools deliberately omit.

See ROADMAP.md for the current task state (rendered from roadmap/tasks.toml by rmap), docs/dogfooding-workflow.md for the operator runbook, and skills/harness-driver/SKILL.md for the AI-orchestrator contract.

Running the node

iex -S mix

Boots the OTP application, Postgres-backed Oban, and the standalone dashboard endpoint. Live surfaces:

URL	What it is
`http://localhost:4018/harness`	LiveView dashboard — project switcher, per-bucket run counts, per-run drill-down with live transcript pane
`http://localhost:4018/harness/oban`	Oban Web — queue / job rows / retries / scheduled work
`http://localhost:4018/harness/mcp`	Native MCP server — flat JSON-RPC tools (`dispatch__`, `roadmap__`, …); the primary surface for a JSON/MCP orchestrator
`http://localhost:4018/tidewave/mcp`	Tidewave MCP endpoint (dev only) — `project_eval` escape hatch for arbitrary eval + struct-surface ops

The standalone Bandit endpoint is gated by config :harness, :dashboard, enabled: true AND Bandit being in the dep stack. Mountable consumers (their own Phoenix endpoint) leave enabled: false and route live "/harness/*path", Harness.Dashboard.Live themselves.

Use harness from another repo

The common case: you have a project (myapp) and want harness — running as a long-lived iex -S mix BEAM in ~/_DATA/code/harness/ — to dispatch tasks from myapp's roadmap to headless coding agents, run myapp's own check stack as the grader, and report verified verdicts back to the AI agent driving from inside myapp.

Three setup steps:

1. Register myapp with harness. Add an entry alongside the self-registered "harness" project in config/dev.exs, then restart iex -S mix:

# ~/_DATA/code/harness/config/dev.exs
config :harness, :projects, [
  [
    name: "harness",
    source: {:local, Path.expand("..", __DIR__)},
    preset: :elixir,
    roadmap_path: Path.expand("..", __DIR__)
  ],
  [
    name: "myapp",
    source: {:local, "/Users/you/_DATA/code/myapp"},
    preset: :elixir,                     # or :rust, or a fully-spec'd %Harness.CheckStack{}
    roadmap_path: "/Users/you/_DATA/code/myapp",
    concurrency_cap: 2
  ]
]

:elixir is the lighter day-to-day stack. To make a green verdict imply "my own mix precommit would also pass" — closing the gap where harness grades green but a coverage gate (or format/warnings-as-errors) would block the merge — register against the mergeable-bar preset instead: preset: {:elixir_precommit, cover_threshold: 80, exclude: [:integration]}. It adds format --check-formatted, compile --warnings-as-errors, a coverage threshold on test, and doctor --raise to the stack.

2. Add harness's MCP endpoints to myapp/.mcp.json — alongside myapp's own Tidewave if it has one. The harness entry (native flat tools) is your primary surface; the optional harness_eval entry is the project_eval escape hatch into harness's BEAM:

{
  "mcpServers": {
    "tidewave": {
      "type": "http",
      "url": "http://localhost:4001/tidewave/mcp"
    },
    "harness": {
      "type": "http",
      "url": "http://localhost:4018/harness/mcp"
    },
    "harness_eval": {
      "type": "http",
      "url": "http://localhost:4018/tidewave/mcp"
    }
  }
}

Claude Code surfaces a server's tools as mcp__<server-name>__<tool>, giving three distinguishable surfaces: mcp__tidewave__project_eval (inspect myapp's state, port 4001), mcp__harness__dispatch__task & the rest of the flat driver tools (dispatch + observe + triage against harness's :4018 BEAM — the primary surface), and mcp__harness_eval__project_eval (escape hatch for arbitrary eval + struct-surface ops). Drop harness_eval if you only need the flat tools. No port collision — different BEAMs / paths.

3. Import the driver skill from myapp/CLAUDE.md so the AI agent in myapp knows how to use the surface:

@~/_DATA/code/harness/skills/harness-driver/SKILL.md

Restart the Claude Code session in myapp to pick up the new .mcp.json entries. After that, the agent dispatches via the flat mcp__harness__dispatch__task tool (and observes with mcp__harness__dispatch__status / dispatch__verdict_detail) against :4018; harness manages isolated worktrees of myapp, runs myapp's check stack, and reports the verified verdict back.

Full driver contract (entry points, two-eval pattern for ephemeral MCP eval processes, cross-checkout sharp edges, secret scrubbing): skills/harness-driver/SKILL.md § "Context A — Driving harness from another repo".

Development

# First time
mix deps.get
mix compile

# Fast local gate (hook-bound, ~180s)
mix check.fast

# Pre-commit gate (no dialyzer — dialyzer lives in precommit.full)
mix precommit

# Full hand-off gate — mirrors CI, includes dialyzer
mix precommit.full

# Focused checks
mix test
mix credo --strict        # includes TODO/FIXME debt visibility by design
mix sobelow --exit --skip
mix sobelow.baseline      # refresh Sobelow skip baseline intentionally

# AI-friendly output
mix test.json
mix dialyzer.json

All tooling is wired per the global Elixir setup conventions (Styler first, Reach for OTP analysis, etc.).

License

MIT (or your preferred license).

Name		Name	Last commit message	Last commit date
Latest commit History 211 Commits
.audit		.audit
.claude		.claude
.grok		.grok
config		config
docs		docs
lib		lib
priv		priv
roadmap		roadmap
skills/harness-driver		skills/harness-driver
test		test
.credo.exs		.credo.exs
.dialyzer_ignore.exs		.dialyzer_ignore.exs
.doctor.exs		.doctor.exs
.formatter.exs		.formatter.exs
.gitignore		.gitignore
.mcp.json		.mcp.json
.reach.exs		.reach.exs
.tool-versions		.tool-versions
CHANGELOG.md		CHANGELOG.md
CLAUDE.full.md		CLAUDE.full.md
CLAUDE.lean.md		CLAUDE.lean.md
CLAUDE.md		CLAUDE.md
README.md		README.md
ROADMAP.md		ROADMAP.md
claude-workflow-test.txt		claude-workflow-test.txt
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harness

Status

Running the node

Use harness from another repo

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Harness

Status

Running the node

Use harness from another repo

Development

License

About

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages