feat(orchestrator): multi-agent goal execution via session_policy spawn by dimakis · Pull Request #377 · dimakis/mitzo

dimakis · 2026-06-09T23:22:14Z

Summary

Enable the TaskOrchestrator to spawn dedicated headless sessions for individual tasks, unlocking multi-agent goal coordination. This is Phase 1 of 4 in the multi-agent orchestration initiative.

What changed

task-orchestrator.ts: Implement session_policy: 'spawn' path in tick(). When a task has sessionPolicy: 'spawn', the orchestrator creates a dedicated headless session instead of routing to the pinned session. Spawned tasks don't set activeTaskId, enabling parallel execution. Falls back to pinned session on failure.
index.ts: Wire spawnSession dependency — creates worktrees, registers in EventStore, starts headless session with NullTransport.
app.ts: Add POST /api/signals/resolve endpoint. External agents (e.g. Centaur) can resolve wait_for_signal tasks by gate metadata (type + repo + PR) without knowing task IDs. Authenticated via internal token.
api-schemas.ts: Add SignalResolveBody zod schema for the new endpoint.
task-store.ts: Add findActiveSignalTasks(gateType) query method for gate-metadata-based task lookup.

Context: Multi-Agent Orchestration Roadmap

The session_policy column and wait_for_signal stage type already existed in the schema but were unused. This PR activates them.

Phase 1 (this PR): Multi-session tick() + signal resolve endpoint
Phase 2: Centaur signal bridge — Centaur POSTs ReviewCompleted to /api/signals/resolve
Phase 3: PR Shepherd creates Task Board goals instead of ad-hoc sessions
Phase 4: Scheduled PR discovery trigger

A PR lifecycle would look like:

Goal: "Shepherd PR #360"
  ├── [agent_work, spawn]         Request Centaur review
  ├── [wait_for_signal]           Wait for review completion
  ├── [agent_work, spawn]         Address findings
  ├── [wait_for_signal, gh_ci]    Wait for CI
  └── [human_review]              Approve merge

Key files

server/task-orchestrator.ts — spawn path in tick()
server/index.ts — spawnSession wiring
server/app.ts — POST /api/signals/resolve
server/task-store.ts — findActiveSignalTasks()

Test plan

7 new tests: spawn dispatch, activeTaskId semantics, fallback, signal lookup
All 48 existing tests pass
TypeScript compilation clean
E2E: create goal with spawn tasks, verify sessions are created
E2E: POST to /api/signals/resolve, verify matching tasks resolve

🤖 Generated with Claude Code

dimakis

Centaur Review

Found 8 issue(s) (5 warning).

`server/task-orchestrator.ts`

Solid feature addition with clean separation between spawn/reuse paths; main concerns are race conditions in the async spawn-failure callback (stale this.goalId/this.pinnedClientId after stop()) and a missing human_approval case in the signal resolve endpoint.

🟡 bugs (L355): In the spawn failure callback (.then handler), this.goalId! is used with a non-null assertion, but stop() could have been called between spawn initiation and callback resolution, setting goalId to null. The ! assertion would then pass a null to setTaskContext. Guard with if (!this.goalId) return; before the fallback logic. [fixable]
🟡 bugs (L356): The spawn-failure fallback sets task context and sends prompt to the pinned session, but does NOT set this.activeTaskId = next.id. This means the orchestrator's status reports no active task, yet a task is being worked on by the pinned session. If another tick() fires (e.g., from a concurrent task completing), getNextExecutable won't return this task (it's already active), but the missing activeTaskId breaks the status reporting contract. Additionally, this.pinnedClientId could also be null by the time the callback fires if stop() was called. [fixable]
🔵 unsafe_assumptions (L368): The recursive this.tick() after spawning is safe (bounded by tasks transitioning from pending to active), but it runs synchronously on the call stack. With many spawn-policy tasks (e.g., 50+ parallel subtasks), this creates deep recursion. Consider using queueMicrotask(() => this.tick()) or setImmediate to avoid stack overflow in large task trees. [fixable]

`server/app.ts`

Solid feature addition with clean separation between spawn/reuse paths; main concerns are race conditions in the async spawn-failure callback (stale this.goalId/this.pinnedClientId after stop()) and a missing human_approval case in the signal resolve endpoint.

🟡 bugs (L1006): SignalResolveBody accepts type: 'human_approval' but the switch statement in the resolve endpoint has no case 'human_approval' branch. A human_approval signal will match the gate type via findActiveSignalTasks but isMatch will remain false since no case handles it — the signal will silently be ignored. Either add a matching case or remove human_approval from the schema. [fixable]
🟡 missing_tests (L980): The new POST /api/signals/resolve endpoint has no integration tests. The existing /api/tasks/:id/signal endpoint also lacks tests, but adding a new gate-matching endpoint with per-type dispatch logic (centaur_review, gh_ci, gh_review) without tests is risky — especially the matching logic with optional repo/pr/pr_url fields. [fixable]
🔵 style (L1008): The gate config property access pattern (gc as Record<string, unknown>).repo as string | undefined is repeated 6 times across 3 switch cases. Consider destructuring the gate config once before the switch: const { repo: taskRepo, pr: taskPr, pr_url: taskPrUrl } = gc as Record<string, unknown>. This reduces duplication and makes the matching logic easier to read. [fixable]

`server/index.ts`

Solid feature addition with clean separation between spawn/reuse paths; main concerns are race conditions in the async spawn-failure callback (stale this.goalId/this.pinnedClientId after stop()) and a missing human_approval case in the signal resolve endpoint.

🟡 unsafe_assumptions (L240): createSessionWorktrees(wtId, BASE_REPO, config.repos) is called synchronously and can throw (e.g., git failures). The try/catch handles this, but if worktree creation fails, the function returns null and the orchestrator falls back to the pinned session — yet the task is already marked active in the orchestrator (line 342). The fallback in the orchestrator's .then(null) path sends to the pinned session but the task was spawned for isolation. Consider whether worktree failure should revert the task to pending instead. [fixable]

`server/tests/task-orchestrator.test.ts`

Solid feature addition with clean separation between spawn/reuse paths; main concerns are race conditions in the async spawn-failure callback (stale this.goalId/this.pinnedClientId after stop()) and a missing human_approval case in the signal resolve endpoint.

🔵 missing_tests (L576): The spawn tests verify the happy path and fallback, but don't test the rejection path (line 359-363 in task-orchestrator.ts) where spawnSession throws — the task should revert to pending. Also missing: a test for multiple spawn tasks being dispatched in parallel (verifying the recursive tick() spawns all pending tasks in sequence). [fixable]

dimakis · 2026-06-09T23:26:53Z

+                log.info('spawned session for task', { taskId: next.id, clientId });
+              } else {
+                log.error('failed to spawn session, falling back to pinned', { taskId: next.id });
+                this.deps.setTaskContext(next.id, this.goalId!);


🟡 bugs: In the spawn failure callback (.then handler), this.goalId! is used with a non-null assertion, but stop() could have been called between spawn initiation and callback resolution, setting goalId to null. The ! assertion would then pass a null to setTaskContext. Guard with if (!this.goalId) return; before the fallback logic. [fixable]

dimakis · 2026-06-09T23:26:53Z

+              } else {
+                log.error('failed to spawn session, falling back to pinned', { taskId: next.id });
+                this.deps.setTaskContext(next.id, this.goalId!);
+                if (this.pinnedClientId) sendToChat(this.pinnedClientId, prompt);


🟡 bugs: The spawn-failure fallback sets task context and sends prompt to the pinned session, but does NOT set this.activeTaskId = next.id. This means the orchestrator's status reports no active task, yet a task is being worked on by the pinned session. If another tick() fires (e.g., from a concurrent task completing), getNextExecutable won't return this task (it's already active), but the missing activeTaskId breaks the status reporting contract. Additionally, this.pinnedClientId could also be null by the time the callback fires if stop() was called. [fixable]

dimakis · 2026-06-09T23:26:53Z

+
+          // Don't set activeTaskId — spawned tasks run independently.
+          // Continue ticking to find more parallel work.
+          this.tick();


🔵 unsafe_assumptions: The recursive this.tick() after spawning is safe (bounded by tasks transitioning from pending to active), but it runs synchronously on the call stack. With many spawn-policy tasks (e.g., 50+ parallel subtasks), this creates deep recursion. Consider using queueMicrotask(() => this.tick()) or setImmediate to avoid stack overflow in large task trees. [fixable]

dimakis · 2026-06-09T23:26:53Z

+    if (!gc) continue;
+
+    let isMatch = false;
+    switch (type) {


🟡 bugs: SignalResolveBody accepts type: 'human_approval' but the switch statement in the resolve endpoint has no case 'human_approval' branch. A human_approval signal will match the gate type via findActiveSignalTasks but isMatch will remain false since no case handles it — the signal will silently be ignored. Either add a matching case or remove human_approval from the schema. [fixable]

dimakis · 2026-06-09T23:26:53Z

+ * External agents (e.g. Centaur) POST here after completing work —
+ * they don't need to know task IDs, just the gate parameters.
+ */
+app.post('/api/signals/resolve', (req, res) => {


🟡 missing_tests: The new POST /api/signals/resolve endpoint has no integration tests. The existing /api/tasks/:id/signal endpoint also lacks tests, but adding a new gate-matching endpoint with per-type dispatch logic (centaur_review, gh_ci, gh_review) without tests is risky — especially the matching logic with optional repo/pr/pr_url fields. [fixable]

dimakis · 2026-06-09T23:26:53Z

+    let isMatch = false;
+    switch (type) {
+      case 'centaur_review': {
+        const taskPrUrl = (gc as Record<string, unknown>).pr_url as string | undefined;


🔵 style: The gate config property access pattern (gc as Record<string, unknown>).repo as string | undefined is repeated 6 times across 3 switch cases. Consider destructuring the gate config once before the switch: const { repo: taskRepo, pr: taskPr, pr_url: taskPrUrl } = gc as Record<string, unknown>. This reduces duplication and makes the matching logic easier to read. [fixable]

dimakis · 2026-06-09T23:26:53Z

+    const config = getRepoConfig();
+
+    try {
+      createSessionWorktrees(wtId, BASE_REPO, config.repos);


🟡 unsafe_assumptions: createSessionWorktrees(wtId, BASE_REPO, config.repos) is called synchronously and can throw (e.g., git failures). The try/catch handles this, but if worktree creation fails, the function returns null and the orchestrator falls back to the pinned session — yet the task is already marked active in the orchestrator (line 342). The fallback in the orchestrator's .then(null) path sends to the pinned session but the task was spawned for isolation. Consider whether worktree failure should revert the task to pending instead. [fixable]

dimakis · 2026-06-09T23:26:53Z

+
+      const [, prompt, goalArg] = spawnSession.mock.calls[0];
+      expect(goalArg).toBe(goal.id);
+      expect(prompt).toContain('Spawn task');


🔵 missing_tests: The spawn tests verify the happy path and fallback, but don't test the rejection path (line 359-363 in task-orchestrator.ts) where spawnSession throws — the task should revert to pending. Also missing: a test for multiple spawn tasks being dispatched in parallel (verifying the recursive tick() spawns all pending tasks in sequence). [fixable]

Enable the TaskOrchestrator to spawn dedicated headless sessions for individual tasks, unlocking multi-agent goal coordination. Changes: - task-orchestrator: implement session_policy 'spawn' path in tick(). Spawned tasks get independent sessions and don't block activeTaskId, allowing parallel execution. Falls back to pinned session on failure. - index.ts: wire spawnSession dep — creates worktrees, registers in event store, starts headless session with NullTransport. - app.ts: add POST /api/signals/resolve endpoint for external agents (e.g. Centaur) to resolve wait_for_signal tasks by gate metadata (type + repo + PR) without knowing task IDs. - api-schemas: add SignalResolveBody zod schema. - task-store: add findActiveSignalTasks(gateType) query method. - Tests: 7 new tests covering spawn dispatch, activeTaskId semantics, fallback behavior, and signal task lookup. Part of multi-agent orchestration initiative — Phase 1 of 4. Next: Centaur signal bridge (Phase 2), PR Shepherd goals (Phase 3). Co-Authored-By: Claude Opus 4.6 <[email protected]>

dimakis · 2026-06-20T17:39:38Z

@centaur-review Please review this PR.

This is Phase 1 of the multi-agent orchestration initiative — enables session_policy: 'spawn' in the TaskOrchestrator and adds /api/signals/resolve for external signal resolution.

Key areas to review:

task-orchestrator.ts — spawn path in tick(), parallel execution semantics
app.ts — new /api/signals/resolve endpoint, auth model
task-store.ts — findActiveSignalTasks() query
index.ts — spawnSession wiring with NullTransport

dimakis

Centaur Review

Found 7 issue(s) (4 warning).

`server/task-orchestrator.ts`

Sound multi-agent spawn design with correct DFS dispatch and orphan recovery, but the async .then() callbacks read this.goalId/this.pinnedClientId live — creating a race with stop() that can pass null to setTaskContext. The /api/signals/resolve endpoint needs test coverage and has an unhandled human_approval gate type.

🟡 bugs (L355): Race condition: this.goalId! is read inside an async .then() callback, but stop() nullifies this.goalId (line 141). If the orchestrator is stopped before spawnSession resolves and returns null, this.goalId! evaluates to null — passed to setTaskContext as a null goalId. Capture goalId before the async call (e.g. const goalId = this.goalId;) and use the captured value in the callback, with a guard for the null case. [fixable]
🟡 bugs (L356): Same race: this.pinnedClientId is read inside the async .then() callback, but stop() nullifies it (line 145). If the orchestrator stops before the spawn-failure fallback fires, the fallback silently does nothing — the task stays active with no session and no pinned client to handle it, effectively orphaned until the next tick reclaims it. Consider guarding with if (this.state !== 'running') return; at the top of both .then() callbacks. [fixable]
🔵 unsafe_assumptions (L368): The recursive this.tick() call after spawning enables parallel task dispatch, but there's no depth guard. If a goal has many spawn-policy leaf tasks, tick() recurses once per task before any stack frame returns. For a goal with ~100 spawn tasks this could approach stack limits. Consider using a loop or queueMicrotask(()=> this.tick()) to avoid unbounded recursion. [fixable]

`server/app.ts`

Sound multi-agent spawn design with correct DFS dispatch and orphan recovery, but the async .then() callbacks read this.goalId/this.pinnedClientId live — creating a race with stop() that can pass null to setTaskContext. The /api/signals/resolve endpoint needs test coverage and has an unhandled human_approval gate type.

🟡 bugs (L1027): human_approval is accepted by the SignalResolveBody schema (api-schemas.ts:188) but has no case in the switch statement (lines 1027-1043). Tasks with type: 'human_approval' will be fetched from the DB but never matched, silently returning { ok: true, matched: [] }. Either add a matching case or remove human_approval from the schema's enum to avoid misleading callers into thinking their resolve request succeeded. [fixable]
🟡 missing_tests (L1001): The /api/signals/resolve endpoint has no test coverage. The gate-matching logic (centaur_review by pr_url/repo+pr, gh_ci/gh_review by repo+pr) is non-trivial and untested at the HTTP layer. At minimum, add tests for: successful match by repo+pr, match by pr_url, no-match when gate metadata doesn't align, and auth rejection. [fixable]
🔵 style (L1029): Repeated (gc as Record<string, unknown>).fieldName as Type casts across the switch cases are noisy. Consider extracting a typed interface for gate config metadata (e.g. { repo?: string; pr?: number; pr_url?: string }) or a helper function to reduce the casting boilerplate. [fixable]

`server/tests/task-orchestrator.test.ts`

Sound multi-agent spawn design with correct DFS dispatch and orphan recovery, but the async .then() callbacks read this.goalId/this.pinnedClientId live — creating a race with stop() that can pass null to setTaskContext. The /api/signals/resolve endpoint needs test coverage and has an unhandled human_approval gate type.

🔵 missing_tests: No test covers the spawn-failure fallback path (spawnSession returning null or rejecting). The existing tests verify the happy path and the missing-spawnSession fallback, but not the case where spawnSession is provided and fails. This is where the this.goalId! race condition lives. [fixable]

dimakis · 2026-06-20T17:42:27Z

+                log.info('spawned session for task', { taskId: next.id, clientId });
+              } else {
+                log.error('failed to spawn session, falling back to pinned', { taskId: next.id });
+                this.deps.setTaskContext(next.id, this.goalId!);


🟡 bugs: Race condition: this.goalId! is read inside an async .then() callback, but stop() nullifies this.goalId (line 141). If the orchestrator is stopped before spawnSession resolves and returns null, this.goalId! evaluates to null — passed to setTaskContext as a null goalId. Capture goalId before the async call (e.g. const goalId = this.goalId;) and use the captured value in the callback, with a guard for the null case. [fixable]

dimakis · 2026-06-20T17:42:27Z

+              } else {
+                log.error('failed to spawn session, falling back to pinned', { taskId: next.id });
+                this.deps.setTaskContext(next.id, this.goalId!);
+                if (this.pinnedClientId) sendToChat(this.pinnedClientId, prompt);


🟡 bugs: Same race: this.pinnedClientId is read inside the async .then() callback, but stop() nullifies it (line 145). If the orchestrator stops before the spawn-failure fallback fires, the fallback silently does nothing — the task stays active with no session and no pinned client to handle it, effectively orphaned until the next tick reclaims it. Consider guarding with if (this.state !== 'running') return; at the top of both .then() callbacks. [fixable]

dimakis · 2026-06-20T17:42:27Z

+
+          // Don't set activeTaskId — spawned tasks run independently.
+          // Continue ticking to find more parallel work.
+          this.tick();


🔵 unsafe_assumptions: The recursive this.tick() call after spawning enables parallel task dispatch, but there's no depth guard. If a goal has many spawn-policy leaf tasks, tick() recurses once per task before any stack frame returns. For a goal with ~100 spawn tasks this could approach stack limits. Consider using a loop or queueMicrotask(()=> this.tick()) to avoid unbounded recursion. [fixable]

dimakis · 2026-06-20T17:42:27Z

+    if (!gc) continue;
+
+    let isMatch = false;
+    switch (type) {


🟡 bugs: human_approval is accepted by the SignalResolveBody schema (api-schemas.ts:188) but has no case in the switch statement (lines 1027-1043). Tasks with type: 'human_approval' will be fetched from the DB but never matched, silently returning { ok: true, matched: [] }. Either add a matching case or remove human_approval from the schema's enum to avoid misleading callers into thinking their resolve request succeeded. [fixable]

dimakis · 2026-06-20T17:42:27Z

+ * External agents (e.g. Centaur) POST here after completing work —
+ * they don't need to know task IDs, just the gate parameters.
+ */
+app.post('/api/signals/resolve', (req, res) => {


🟡 missing_tests: The /api/signals/resolve endpoint has no test coverage. The gate-matching logic (centaur_review by pr_url/repo+pr, gh_ci/gh_review by repo+pr) is non-trivial and untested at the HTTP layer. At minimum, add tests for: successful match by repo+pr, match by pr_url, no-match when gate metadata doesn't align, and auth rejection. [fixable]

dimakis · 2026-06-20T17:42:27Z

+    let isMatch = false;
+    switch (type) {
+      case 'centaur_review': {
+        const taskPrUrl = (gc as Record<string, unknown>).pr_url as string | undefined;


🔵 style: Repeated (gc as Record<string, unknown>).fieldName as Type casts across the switch cases are noisy. Consider extracting a typed interface for gate config metadata (e.g. { repo?: string; pr?: number; pr_url?: string }) or a helper function to reduce the casting boilerplate. [fixable]

- Fix race condition: capture goalId/pinnedClientId before async boundary and compare against current state in callbacks. Handles stop()+start() with different goal, not just stop() alone. - Add spawn depth guard (MAX_SPAWN_DEPTH=50) to prevent runaway recursive tick dispatch for goals with many spawn tasks. - Fix fallback conflict: when spawn returns null, only claim pinned session if activeTaskId is null; otherwise mark task as blocked. - Add human_approval case to /api/signals/resolve switch statement. - Extract typed destructuring for gate config in resolve endpoint. - Add tests: goal-restart race, pinned-session-busy fallback, gate matching logic for all signal types, endpoint resolve flow. All 63 tests pass. Addresses all findings from Centaur reviews. Co-Authored-By: Claude Opus 4.6 <[email protected]>

dimakis · 2026-06-20T18:25:08Z

@centaur-review Please re-review — all findings from the previous two reviews have been addressed:

Race condition fixed — captured goalId/pinnedClientId as locals before async boundary; guard now checks this.goalId !== capturedGoalId (handles stop+restart, not just stop)
Spawn depth guard — MAX_SPAWN_DEPTH=50 prevents runaway recursive tick dispatch
Fallback conflict fixed — only claims pinned session if activeTaskId is null; otherwise marks task as blocked
human_approval case added — switch now handles all gate types
Typed destructuring — eliminated repeated (gc as Record<string, unknown>) casts
Tests added — goal-restart race, pinned-session-busy fallback, gate matching for all signal types, endpoint resolve flow (63 tests total, all passing)

dimakis commented Jun 9, 2026

View reviewed changes

dimakis force-pushed the feat/multi-agent-orchestration branch from 615d0d3 to c000d4f Compare June 20, 2026 17:39

dimakis commented Jun 20, 2026

View reviewed changes

Conversation

dimakis commented Jun 9, 2026

Summary

What changed

Context: Multi-Agent Orchestration Roadmap

Key files

Test plan

Uh oh!

dimakis left a comment

Choose a reason for hiding this comment

Centaur Review

server/task-orchestrator.ts

server/app.ts

server/index.ts

server/__tests__/task-orchestrator.test.ts

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimakis commented Jun 20, 2026

Uh oh!

dimakis left a comment

Choose a reason for hiding this comment

Centaur Review

server/task-orchestrator.ts

server/app.ts

server/__tests__/task-orchestrator.test.ts

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimakis commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`server/task-orchestrator.ts`

`server/app.ts`

`server/index.ts`

`server/tests/task-orchestrator.test.ts`

`server/task-orchestrator.ts`

`server/app.ts`

`server/tests/task-orchestrator.test.ts`