Skip to content

feat: parallel tool execution + agent teams#2831

Open
TheArchitectit wants to merge 64 commits intoultraworkers:mainfrom
TheArchitectit:feat/multi-tool-exec
Open

feat: parallel tool execution + agent teams#2831
TheArchitectit wants to merge 64 commits intoultraworkers:mainfrom
TheArchitectit:feat/multi-tool-exec

Conversation

@TheArchitectit
Copy link
Copy Markdown

@TheArchitectit TheArchitectit commented Apr 28, 2026

Summary

This PR implements parallel tool execution and a full agent teams system that mimics Claude Code's agent teams architecture (reference: Claude Code Agent Teams Guide).

Parallel Tool Execution

When the model returns multiple tool_use blocks, read-only tools execute concurrently via std::thread::scope instead of sequentially. The 3-phase run_turn loop:

  1. Pre-hooks + permission checks (sequential — must check permissions before execution)
  2. Tool execution (batch, parallel for read-only tools via PARALLEL_SAFE_TOOLS)
  3. Post-hooks + session updates (sequential — preserves ordering for session history)

Agent Teams System

Full multi-agent coordination system with inbox-based messaging, task claiming, and automatic progress reporting.

Team Creation & Modes

  • TeamCreate tool spawns N agents in parallel OS threads with isolated contexts
  • Team mode presets: tiny/1x (4 agents), small/2x (8), medium/3x (12), large/4x (16), xlarge/5x (20), mega/6x (24)
  • Each agent gets role: Explore, Plan, Verification, or Reviewer (read-only)
  • Toggle teams with /team on|off|status or Ctrl+T (off by default)

Mailbox Messaging

  • AgentMessage tool: send, read, broadcast between agents
  • Team inbox directory: .clawd-agents/mailbox/team/{team_id}/
  • Completion events: agent_completed / agent_failed posted automatically

Task Claiming

  • TaskClaim tool: claim, release, list — atomic lock files prevent duplicate work
  • Lock files: .clawd-agents/claims/{task_id}.lock with agent_id, team_id, timestamp
  • Team-scoped task_ids prevent cross-team claim collisions
  • Claims released on success, failure, and panic

Progress Reporting

  • TurnProgressReporter trait: callback after each tool call in run_turn
  • TeamInboxReporter: implements the trait, writes tool_progress events to team inbox
  • Per-tool-call progress: tool_name, input_preview, result_preview, iteration, timestamp
  • Periodic git commits every 5 tool calls (preserves work-in-progress)

Reviewer Agents

  • Reviewer subagent_type with read-only tools (no bash, no write)
  • 1 reviewer per 3 builders (minimum 1)
  • Can use AgentMessage, TaskClaim, TeamStatus for coordination

AGENTS.md System

  • AgentSuggestion tool: agents propose additions to AGENTS.md (pattern, pitfall, style)
  • Suggestions saved as .clawd-agents/suggestions/ JSON files for human review
  • TeamStatus action=suggestions lists pending suggestions
  • AGENTS.md auto-loaded into sub-agent system prompts as "Shared Team Learnings"

Context Management

  • ContextRequest tool: iterative retrieval with 3-cycle budget for sub-agents
  • Context-window-aware auto-compaction: compacts at 70% of model's context limit
  • Model token limits for qwen, glm, and generic fallback (131K context, 8K output)
  • Prevents the "1M tokens to 200K context model" overflow bug

Team Monitoring

  • TeamStatus tool with 5 actions: status, summary, events, inbox, kill
  • Background team watcher prints [team] progress to stderr
  • Kill signal: TeamStatus action=kill writes kill file that agent checks
  • TeamDelete: disk-based deletion with inbox/claims cleanup

SubAgent → Agent Evolution

  • Removed SubAgent tool, consolidated into Agent with subagentModel config
  • subagentModel field in settings.json for fast model configuration
  • Config validation on REPL start with setup wizard suggestions

Files Changed

Core Runtime (rust/crates/runtime/)

  • conversation.rs — 3-phase run_turn loop, TurnProgressReporter trait, with_auto_compaction_input_tokens_threshold, with_turn_progress_reporter builder
  • config.rsRuntimeProviderConfig struct, ApiTimeoutConfig, provider/subagent_model fields on RuntimeFeatureConfig
  • config_validate.rs — Validation for provider, lsp, lspAutoStart, apiTimeout, subagentModel fields
  • lib.rs — Exports for TurnProgressReporter, RuntimeProviderConfig, ApiTimeoutConfig

API Layer (rust/crates/api/)

  • providers/mod.rsmodel_token_limit(), ModelTokenLimit struct, qwen/glm/generic fallback entries, max_tokens_for_model_with_override()
  • lib.rs — Exports for model_token_limit, ModelTokenLimit
  • error.rsContextWindowExceeded variant with detailed token budget info

Tools (rust/crates/tools/)

  • lib.rs — All team tools, agent system, inbox architecture:
    • TeamCreate — spawns agents with role, team_id, task_id
    • TeamStatus — status/summary/events/inbox/kill/suggestions
    • TeamDelete — disk-based deletion with cleanup
    • AgentMessage — send/read/broadcast
    • TaskClaim — claim/release/list with atomic lock files
    • AgentSuggestion — propose AGENTS.md additions
    • ContextRequest — iterative retrieval with 3-cycle budget
    • TeamInboxReporter — per-tool-call progress to team inbox
    • expand_team_mode() — 1x-6x / tiny-small-medium-large-xlarge-mega presets
    • run_agent_job() — context-window-aware auto-compaction, CLAWD_AGENT_STORE fix
    • spawn_agent_job() — claim release on failure/panic
    • build_agent_system_prompt() — loads AGENTS.md
    • claims_dir(), claim_task(), release_claim(), list_claims()
    • post_agent_completion_to_team_inbox() with error field
    • AgentInput/AgentOutput/AgentJob with team_id, task_id, max_tokens

CLI (rust/crates/rusty-claude-cli/)

  • main.rs — PARALLEL_SAFE_TOOLS, /team slash command handler, Ctrl+T toggle, config validation on startup, TeamStatus/TaskClaim/AgentSuggestion/ContextRequest in safe tools
  • input.rsReadOutcome::TeamToggle, Ctrl+T binding (\x02 sentinel)
  • setup_wizard.rs — Fast model prompt saves as subagentModel

Commands (rust/crates/commands/)

  • lib.rsSlashCommand::Team { action } variant, /team parse mapping and slash_name

Architecture Diagram

┌─────────────────────────────────────────────┐
│         Team Lead (Main Session)            │
│  - Creates team via TeamCreate              │
│  - Monitors via TeamStatus                  │
│  - Coordinates via AgentMessage            │
│  - Reviews AGENTS.md suggestions           │
│  - Toggles teams with /team or Ctrl+T      │
└─────────────────┬───────────────────────────┘
                  │
        ┌─────────┴─────────┐
        │                   │
┌───────▼────────┐  ┌───────▼────────┐
│  Teammate 1     │  │  Teammate 2    │
│  (Explore)      │  │ (Reviewer)     │
│                 │  │                 │
│ - Claims task   │  │ - Read-only    │
│ - Reports       │  │ - Reviews code │
│   progress      │  │ - Suggests     │
│ - Commits       │  │   improvements│
│   periodically  │  │                │
└─────────────────┘  └────────────────┘
        │                   │
        └───── Inbox ───────┘
              .clawd-agents/mailbox/team/{team_id}/
              • tool_progress events
              • agent_completed/failed events
              • kill signals
              • AGENTS.md suggestions

Test Plan

  • TeamCreate spawns agents with correct roles and mode presets
  • TaskClaim: claim → list shows it → release → list empty
  • AgentMessage: send, read, broadcast work correctly
  • AgentSuggestion: pattern and pitfall proposals saved as pending_review
  • TeamStatus: status, summary, inbox, kill, suggestions all functional
  • TeamDelete: deletes manifest, cleans inbox, releases claims
  • Context-window-aware auto-compaction compacts before overflow
  • /team on/off/status and Ctrl+T toggle team mode
  • Stale claims from prior runs don't block new teams (team-scoped task_ids)
  • Claims released on failure/panic
  • Full 8-agent team (2x mode) completes with all agents reporting results
  • Verify parallel tool execution timing with multiple read_file calls
  • Verify mixed read+write calls preserve ordering
  • Test ContextRequest tool with missing files/symbols
  • Test kill signal terminates stuck agent
  • Test AGENTS.md is loaded into sub-agent system prompt

TheArchitectit and others added 30 commits April 23, 2026 14:20
When the model API returns a context_window_blocked error (because the request
exceeds the model's context window), the CLI now automatically:

1. Compact the session (remove old messages to free up space)
2. Retry the original request with the compacted session
3. Report results to the user

This eliminates the need for users to manually run /compact when they
hit context limits - the recovery happens automatically.

## Technical Details

- Detection: Looks for 'context_window' or 'Context window' in error message
- Uses runtime::compact_session() to aggressively compact (max_estimated_tokens=0)
- Creates new runtime with compacted session and retries the turn
- Reports compaction results and final status to user

## Testing

Tested successfully with a request that exceeded model's context:
- Auto-compact triggered: 'Messages removed 19, Messages kept 5'
- Successfully retried and completed after compaction
…t-window-error

feat: auto-compact and retry on context window errors
When the model API returns a context_window_blocked error (because the request
exceeds the model's context window), the CLI now automatically:

1. Compact the session (remove old messages to free up space)
2. Retry the original request with the compacted session
3. Report results to the user

This eliminates the need for users to manually run /compact when they
hit context limits - the recovery happens automatically.

## Technical Details

- Detection: Looks for 'context_window' or 'Context window' in error message
- Uses runtime::compact_session() to aggressively compact (max_estimated_tokens=0)
- Creates new runtime with compacted session and retries the turn
- Reports compaction results and final status to user

## Testing

Tested successfully with a request that exceeded model's context:
- Auto-compact triggered: 'Messages removed 19, Messages kept 5'
- Successfully retried and completed after compaction
…rl+P

Adds an interactive setup wizard that lets users configure their provider,
API key, base URL, and model without setting environment variables.
Configuration is persisted to ~/.claw/settings.json (with 0600 permissions).

New features:
- `claw setup` CLI subcommand runs the wizard from the terminal
- `/setup` slash command runs the wizard inside the REPL (hot-swaps model)
- Ctrl+P hotkey in the REPL triggers /setup for in-session provider swap
- Stored provider config used as fallback when env vars are absent
- Three-tier auth resolution: env var > .env file > stored config
- RuntimeProviderConfig struct and validation in settings schema

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Ctrl+P now inserts a sentinel char (\x01) that the highlighter renders
as a cyan "[Provider Swap]" prompt. User presses Enter to confirm and
launch the setup wizard. Returns ReadOutcome::ProviderSwap so the REPL
loop can hot-swap the model and reprint the connection line.

Also fixes clippy warnings: merged duplicate match arms in
provider_config_value, doc_markdown on ProviderKind, map_unwrap_or
idioms in setup_wizard.rs, and pre-existing clippy issues in main.rs
and commands/lib.rs.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Previously /resume latest only searched the current workspace's
fingerprinted session directory. If you started claw from a different
directory, it found zero sessions even though sessions existed
elsewhere on disk.

Changes:
- Add global_sessions_root() pointing to ~/.claw/sessions/
- Add scan_global_sessions() to scan all workspace namespaces
- Modify latest_session() to fall back to global scan when no
  workspace-local sessions are found
- Add load_session_loose() that skips workspace validation for
  alias references (latest/last/recent) so cross-workspace resume
  works while still enforcing workspace check for explicit IDs
- Wire load_session_loose() into CLI's load_session_reference()
- Add provider field to config validation schema (needed because
  user's settings.json already has the provider key)

Co-Authored-By: Claude Opus 4.7 <[email protected]>
The previous implementation only scanned ~/.claw/sessions/ for the
global fallback, but sessions are actually stored in the project-local
<cwd>/.claw/sessions/<fingerprint>/ by SessionStore::from_cwd().
Now scans both the global root and the project-local parent directory
(checking all fingerprint subdirs) so /resume latest finds sessions
regardless of where they're stored.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Previously /resume latest returned the most recently created session,
which was always the empty one just created on startup. Now it skips
sessions with 0 messages and excludes the current session ID, so it
finds the previous session with actual conversation history.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Implement complete LSP support for code intelligence tools:

- lsp_transport.rs: JSON-RPC 2.0 transport over stdio with Content-Length
  framing, async request/response handling, and graceful shutdown

- lsp_process.rs: LSP process manager with initialize handshake, and methods
  for hover, goto_definition, references, document_symbols, completion, format

- lsp_discovery.rs: Auto-discovery of installed LSP servers (rust-analyzer,
  clangd, gopls, pyright, typescript-language-server, etc.) with PATH lookup

- lsp_client.rs: Rewired LspRegistry to use real LSP processes instead of
  placeholder JSON, with lazy-start on first dispatch call

- config.rs: Added LspServerConfig for user-configured LSP servers

- config_validate.rs: Validation for lsp config section

- main.rs: CLI integration with server discovery at startup, /lsp slash
  command for status/start/stop/restart, and graceful shutdown on exit

- commands/src/lib.rs: Added SlashCommand::Lsp variant

The LSP tool is now available to the agent for hover, definition, references,
symbols, completion, and diagnostics queries. Servers are auto-discovered at
REPL startup and lazily started on first use.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
rust-analyzer installed through rustup exits non-zero on --version
("Unknown binary in official toolchain"), which caused discovery
to skip it. Changed command_exists_on_path to treat any successful
spawn as "found", regardless of exit code — only a failure to
spawn (command not found) means the server isn't available.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
…chment

Wire LSP into the Read/Edit/Write tool flow so the agent automatically
gets diagnostics after file operations:

- lsp_transport: Add LspServerMessage enum, read_message() for handling
  both responses and server-initiated notifications, notification queue
  with drain_notifications(), send_request now handles interleaved
  publishDiagnostics without breaking

- lsp_process: Add did_open(), did_change(), drain_diagnostics(),
  open file tracking (HashSet) and version counters for didChange,
  language_id_for_path() and severity_name() helpers

- lsp_client: Add notify_file_open(), notify_file_change(),
  fetch_diagnostics_for_file() with best-effort graceful fallback,
  registry-level open file tracking, diagnostic caching

- tools: Enrich run_read_file with didOpen + diagnostics, run_write_file
  and run_edit_file with didChange + diagnostics, format_diagnostic_appendix()
  for readable diagnostic output appended to tool results

All enrichment is non-blocking: if no LSP server is available, tools work
exactly as before. No errors propagate from the LSP layer.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Split the three large LSP files into module directories with sub-files:

lsp_transport/ (was 560 lines):
  - mod.rs (425) — types + LspTransport impl
  - tests.rs (134) — test module

lsp_process/ (was 929 lines):
  - mod.rs (436) — LspProcess struct + public methods + error types
  - parse.rs (311) — helper functions and LSP response parsers
  - tests.rs (194) — test module

lsp_client/ (was 1338 lines):
  - mod.rs (466) — LspRegistry struct + impl, re-exports from types
  - types.rs (103) — LspAction, LspDiagnostic, LspServerStatus, etc.
  - dispatch.rs (224) — LspRegistry::dispatch() method
  - tests.rs (273) — core registry tests
  - tests_lifecycle.rs (294) — lifecycle and integration tests

All files under 500 lines. All 501 runtime tests pass. Clippy clean.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
…transport modules

- Add lsp_auto_start field to RuntimeFeatureConfig (default: true)
- Add lspAutoStart bool field validation in config_validate
- Parse lspAutoStart from config JSON
- Auto-start discovered LSP servers on REPL init when enabled
- Add /lsp toggle command to enable/disable auto-start at runtime
- Remove lsp_client.rs, lsp_process.rs, lsp_transport.rs (2831 lines)
  — functionality consolidated into discovery-based auto-start
- Show auto-start status in /lsp status output
Implement complete LSP support for code intelligence tools:

- lsp_transport.rs: JSON-RPC 2.0 transport over stdio with Content-Length
  framing, async request/response handling, and graceful shutdown

- lsp_process.rs: LSP process manager with initialize handshake, and methods
  for hover, goto_definition, references, document_symbols, completion, format

- lsp_discovery.rs: Auto-discovery of installed LSP servers (rust-analyzer,
  clangd, gopls, pyright, typescript-language-server, etc.) with PATH lookup

- lsp_client.rs: Rewired LspRegistry to use real LSP processes instead of
  placeholder JSON, with lazy-start on first dispatch call

- config.rs: Added LspServerConfig for user-configured LSP servers

- config_validate.rs: Validation for lsp config section

- main.rs: CLI integration with server discovery at startup, /lsp slash
  command for status/start/stop/restart, and graceful shutdown on exit

- commands/src/lib.rs: Added SlashCommand::Lsp variant

The LSP tool is now available to the agent for hover, definition, references,
symbols, completion, and diagnostics queries. Servers are auto-discovered at
REPL startup and lazily started on first use.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
rust-analyzer installed through rustup exits non-zero on --version
("Unknown binary in official toolchain"), which caused discovery
to skip it. Changed command_exists_on_path to treat any successful
spawn as "found", regardless of exit code — only a failure to
spawn (command not found) means the server isn't available.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
…chment

Wire LSP into the Read/Edit/Write tool flow so the agent automatically
gets diagnostics after file operations:

- lsp_transport: Add LspServerMessage enum, read_message() for handling
  both responses and server-initiated notifications, notification queue
  with drain_notifications(), send_request now handles interleaved
  publishDiagnostics without breaking

- lsp_process: Add did_open(), did_change(), drain_diagnostics(),
  open file tracking (HashSet) and version counters for didChange,
  language_id_for_path() and severity_name() helpers

- lsp_client: Add notify_file_open(), notify_file_change(),
  fetch_diagnostics_for_file() with best-effort graceful fallback,
  registry-level open file tracking, diagnostic caching

- tools: Enrich run_read_file with didOpen + diagnostics, run_write_file
  and run_edit_file with didChange + diagnostics, format_diagnostic_appendix()
  for readable diagnostic output appended to tool results

All enrichment is non-blocking: if no LSP server is available, tools work
exactly as before. No errors propagate from the LSP layer.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Split the three large LSP files into module directories with sub-files:

lsp_transport/ (was 560 lines):
  - mod.rs (425) — types + LspTransport impl
  - tests.rs (134) — test module

lsp_process/ (was 929 lines):
  - mod.rs (436) — LspProcess struct + public methods + error types
  - parse.rs (311) — helper functions and LSP response parsers
  - tests.rs (194) — test module

lsp_client/ (was 1338 lines):
  - mod.rs (466) — LspRegistry struct + impl, re-exports from types
  - types.rs (103) — LspAction, LspDiagnostic, LspServerStatus, etc.
  - dispatch.rs (224) — LspRegistry::dispatch() method
  - tests.rs (273) — core registry tests
  - tests_lifecycle.rs (294) — lifecycle and integration tests

All files under 500 lines. All 501 runtime tests pass. Clippy clean.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
…transport modules

- Add lsp_auto_start field to RuntimeFeatureConfig (default: true)
- Add lspAutoStart bool field validation in config_validate
- Parse lspAutoStart from config JSON
- Auto-start discovered LSP servers on REPL init when enabled
- Add /lsp toggle command to enable/disable auto-start at runtime
- Remove lsp_client.rs, lsp_process.rs, lsp_transport.rs (2831 lines)
  — functionality consolidated into discovery-based auto-start
- Show auto-start status in /lsp status output
Remove SlashCommand::Setup (provider wizard), PROVIDER_FIELDS
(provider config), and stale imports that leaked in from the
feat/lsp-integration branch which included other PRs. Also fix
pre-existing clippy findings (Duration::from_hours, is_ok_and).
Add the 3-stage Trident compaction strategy from R.A.D.1.C.A.L,
adapted for the Rust CLI session model:

Stage 1 - SUPERSEDE: Zero-cost factual pruning. If a file was read
and then later written/edited, the earlier read is obsolete and
removed. Earlier writes superseded by later writes are also dropped.

Stage 2 - COLLAPSE: Buffer short chatty exchanges (under 200 chars,
no tool calls) and collapse them into dense summary blocks when the
threshold is exceeded.

Stage 3 - CLUSTER: Group semantically similar messages (same tool
names, same file paths, similar lengths) using Jaccard-based
fingerprinting and collapse clusters into summary blocks.

All three stages run before the existing summary-based compaction,
so less data needs to be summarized. Wired into both /compact and
the auto-compact retry on context window errors.
…e retry

- Add TimeoutConfig to HTTP client builder with connect_timeout (30s)
  and request_timeout (5min) defaults, configurable via
  CLAW_API_CONNECT_TIMEOUT and CLAW_API_REQUEST_TIMEOUT env vars
- Add with_timeout() builder to both AnthropicClient and
  OpenAiCompatClient for per-client timeout configuration
- Parse Retry-After header on 429 responses and use it to override
  exponential backoff delay when present
- Add ApiTimeoutConfig to runtime config with apiTimeout settings
  in ~/.claw/settings.json (connectTimeout, requestTimeout, maxRetries)
- Add retry_after field to ApiError::Api for propagating rate limit
  backoff hints through the retry pipeline
Some providers/proxies return HTTP 400 with bodies like "no parseable
body" or "connection reset" during transient network blips. These are
not real bad requests — they're gateway errors wearing a 400 mask.
Detect known gateway error phrases in 400 response bodies and mark
them as retryable so the existing exponential backoff handles them.
- compact.rs: fix panic when preserve_recent_messages=0
- main.rs: progressive 4-round auto-compact retry with session_mut fix
- main.rs: detect "no parseable body" as context window overflow
- anthropic.rs: remove debug eprintln
- error.rs: add "no parseable body" to CONTEXT_WINDOW_ERROR_MARKERS
- config.rs, lib.rs: conflict resolution fixes from merge

💘 Generated with Crush

Assisted-by: GLM 5.1 FP8 via Crush <[email protected]>
TheArchitectit and others added 19 commits April 28, 2026 11:42
When the model returns multiple tool_use blocks in a single response,
read-only tools (read_file, glob_search, grep_search, WebFetch,
WebSearch, LSP, Git*) now execute concurrently via std::thread::scope
instead of sequentially.

Architecture:
- Add ToolCall/ToolResult types to ToolExecutor trait
- Add execute_batch() with default sequential fallback
- CliToolExecutor overrides execute_batch to classify tools as
  parallel-safe or sequential, then runs parallel-safe tools
  concurrently via thread scopes
- run_turn refactored into 3 phases:
  1. Pre-hooks + permission checks (sequential, hooks may mutate state)
  2. Tool execution (batch, parallel for read-only tools)
  3. Post-hooks + session updates (sequential, preserves ordering)

Safety guarantees:
- Pre/post hooks always run sequentially
- Permission checks complete before any tool executes
- Tool results pushed to session in original model order
- Fallback to sequential for single-tool batches
Add a SubAgent tool that lets the main model delegate multi-step read
and search tasks to a fast, inexpensive sub-agent. The sub-agent runs
autonomously with its own ConversationRuntime, making multiple tool
calls without round-tripping through the main model.

- SubAgent tool with prompt, task_type (Explore/Plan/Verify), and
  optional model override
- Explore: read_file, glob_search, grep_search, WebFetch, WebSearch
- Plan: Explore + TodoWrite
- Verify: Plan + bash
- subagentModel config field in settings.json (falls back to main model)
- Uses ProviderRuntimeClient + SubagentToolExecutor (same as Agent tool)

This dramatically reduces token usage for information-gathering tasks.
Example: "find all Rust files that import X and summarize their usage"
→ one SubAgent call instead of 10 sequential main-model round trips.
Add an optional "Fast Model" prompt to the setup wizard for configuring
the SubAgent's fast/cheap model. This is saved as `subagentModel` in
settings.json. Press Enter to skip — SubAgent falls back to the main
model if not configured.
Match Claude Code's architecture: a single Agent tool with subagent_type
(Explore/Plan/Verification) that uses the subagentModel setting when
no explicit model override is provided. The separate SubAgent tool is
removed since it duplicated existing functionality.
- Agent tool is now parallel-safe: multiple Agent calls execute concurrently
- AgentMessage tool: send/read/broadcast between agents via shared mailbox
- TeamCreate rewired to spawn real Agent threads instead of TaskRegistry
- Agents set CLAWD_AGENT_ID env var so they can identify themselves for messaging
- AgentMessage added to Explore/Plan/Verification sub-agent tool lists
- Team manifest persisted to .clawd-agents/teams/{team_id}.json
- Add missing AgentMessage dispatch entry in execute_tool_with_enforcer
  (tool spec existed but couldn't actually be called)
- Add AgentMessage to Explore/Plan/Verification/general-purpose
  allowed_tools lists so sub-agents can communicate
- Add debug logging to resolve_agent_model and load_subagent_model_from_config
  to diagnose subagentModel config chain

Co-authored-by: GLM 5.1 FP8 via Crush <[email protected]>
Add 'mode' field to TeamCreate that auto-generates agent teams:
- "2x" = 2 Explore + 2 Plan + 2 Verification = 6 agents
- "4x" = 4 Explore + 4 Plan + 4 Verification = 12 agents
- "6x" = 6 Explore + 6 Plan + 6 Verification = 18 agents

When mode is set, 'prompt' provides the shared task description and
'tasks' is ignored. Also add 'subagent_type' and 'model' fields to
manual task items for per-task role and model control.

Co-authored-by: GLM 5.1 FP8 via Crush <[email protected]>
Instead of erroring when neither mode nor tasks are specified,
default to "2x" (2 Explore + 2 Plan + 2 Verification = 6 agents).

Co-authored-by: GLM 5.1 FP8 via Crush <[email protected]>
The combined branch had the old setup_wizard without prompt_fast_model()
and save_settings_field(), so claw setup never asked for the subagent
model. Restore the provider-wizard version that includes the fast model
prompt and writes subagentModel to settings.json.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
The fast model prompt (prompt_fast_model, subagentModel) was lost during
the merge into feat/all-prs-combined. This adds it back so claw setup
asks for a smaller/cheaper model for Agent subtasks and writes
subagentModel to ~/.claw/settings.json.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
- TeamStatus tool with 3 actions:
  - status: live snapshot (running/completed/failed counts, agent details)
  - summary: final results when agents finish (includes result content)
  - events: timeline from append-only event log
- Background team watcher thread spawned by TeamCreate:
  - Polls agent .json files every 2s
  - Prints [team] progress to stderr on agent completion/failure
  - Updates team manifest status when all agents finish
  - Writes events to .clawd-agents/teams/{team_id}-events.jsonl
- TeamStatus added to PARALLEL_SAFE_TOOLS and all agent allowed_tools

Co-authored-by: GLM 5.1 FP8 via Crush <[email protected]>
On REPL start, check for missing provider.apiKey, provider.baseUrl,
and subagentModel. Print a warning with instructions to run `claw setup`
or `/setup` if any are absent.

Co-authored-by: GLM 5.1 FP8 via Crush <[email protected]>
- Agents post completion/failure to team inbox on termination
  (.clawd-agents/mailbox/team/{team_id}/{agent_id}-{ts}.json)
- Team watcher reads from inbox instead of polling .json files
- New TeamStatus action=inbox reads team messages from the inbox
- AgentOutput carries team_id, persisted in manifest
- AgentInput accepts team_id from TeamCreate
- TeamCreate passes team_id to each spawned agent
- Inbox cleaned up when all agents finish

Co-authored-by: GLM 5.1 FP8 via Crush <[email protected]>
…nitoring

- TeamInboxReporter: per-tool-call progress reporting to team inbox
- TaskClaim tool: atomic claim/release/list with .clawd-agents/claims/ lock files
- Team-scoped task_ids to prevent cross-team claim collisions
- AgentSuggestion tool: propose AGENTS.md additions (human review required)
- ContextRequest tool: iterative retrieval with 3-cycle budget for sub-agents
- Context-window-aware auto-compaction (70% threshold) prevents overflow
- Model token limits for qwen/glm/generic models with 131K fallback
- Reviewer subagent_type: read-only tools, no bash/write
- Team mode presets: 1x-6x (tiny/small/medium/large/xlarge/mega)
- /team slash command + Ctrl+T toggle (off by default, CLAWD_AGENT_TEAMS=1)
- TeamDelete: disk-based deletion with inbox/claims cleanup
- TeamStatus: kill stuck agents, list AGENTS.md suggestions
- AGENTS.md: auto-loaded shared learnings in sub-agent system prompt
- Periodic git commits every 5 tool calls via TeamInboxReporter
- Claims released on failure/panic in spawn_agent_job
- Fixed doubled .clawd-agents/.clawd-agents/ paths (set CLAWD_AGENT_STORE abs)
- Fixed "unknown error" in team watcher (added error field to inbox messages)

💘 Generated with Crush

Assisted-by: GLM 5.1 FP8 via Crush <[email protected]>
@TheArchitectit TheArchitectit changed the title feat: parallel tool execution & SubAgent delegation feat: parallel tool execution + agent teams Apr 29, 2026
@TheArchitectit TheArchitectit marked this pull request as ready for review April 29, 2026 20:22
TheArchitectit and others added 7 commits April 29, 2026 21:11
Some OpenAI-compatible providers (e.g., GLM-5) omit the `id` field in
streaming and non-streaming responses. Adding #[serde(default)] allows
the parser to accept these responses instead of failing with
"missing field `id`".

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Adds scripts/install.sh that builds the release binary and links it
to ~/.local/bin/claw. Run after code changes to update the CLI.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
When a provider returns HTML (e.g., error page, wrong endpoint) instead
of JSON in an SSE stream, provide a clear error message instead of
hanging or failing with a cryptic parse error.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
When a provider returns a JSON error (e.g., {"error":{"message":"..."}})
without SSE framing (no "data:" prefix), the SSE parser was silently
ignoring it and hanging. Now detects and surfaces these errors.

Also handles HTML responses that lack SSE framing.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Some providers (GLM, DeepSeek) emit reasoning tokens in `reasoning_content`
or nested `thinking.content` fields instead of `content`. Added support
for these fields so reasoning models work correctly.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
The final streaming chunk from some providers contains only finish_reason
and usage, with no delta field. Made it optional to prevent parse errors.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
When preserve_recent_messages == 0, raw_keep_from equals messages.len(),
causing index out of bounds when accessing session.messages[k].

Added k >= session.messages.len() check to prevent panic.

Reason: Compaction with preserve_recent_messages=0 triggered OOB access
when checking for tool-use/tool-result pair preservation at boundary.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@TheArchitectit
Copy link
Copy Markdown
Author

Review Checklist — Parallel Tool Execution + Agent Teams

User-facing behavior added

  • Parallel tool execution: multiple tool_use blocks execute concurrently for read-only tools via std::thread::scope
  • Agent teams system: TeamCreate spawns N agents in parallel OS threads with isolated contexts
  • Team mode presets: tiny/1x (4 agents), small/2x (8), medium/3x (12), large/4x (16), xlarge/5x (20), mega/6x (24)
  • Agent roles: Explore, Plan, Verification, Reviewer (read-only)
  • Team toggle: /team on|off|status or Ctrl+T (off by default)
  • Mailbox messaging: AgentMessage tool — send, read, broadcast between agents
  • Task claiming: TaskClaim tool — claim/release/list with atomic lock files
  • Progress reporting: per-tool-call progress events written to team inbox
  • Reviewer agents: 1 reviewer per 3 builders (minimum 1), read-only tools only
  • AGENTS.md system: agents propose additions (pattern, pitfall, style) for human review
  • Context management: ContextRequest tool with 3-cycle budget for sub-agents
  • Context-window-aware auto-compaction: compacts at 70% of model's context limit
  • Team monitoring: TeamStatus tool — status, summary, events, inbox, kill, suggestions

Config keys / CLI flags changed

  • subagentModel (string) — fast model for sub-agents (saved by setup wizard)
  • provider / subagent_model fields on RuntimeFeatureConfig
  • apiTimeout / ApiTimeoutConfig — provider timeout configuration
  • /team on|off|status — toggle team mode at runtime
  • Ctrl+T keybinding — toggle team mode from REPL

Migration or compatibility notes

  • Breaking: SubAgent tool removed — consolidated into Agent with subagentModel config
  • Breaking: New SlashCommand::Team variant — any exhaustive matches need updating
  • New tools: TeamCreate, TeamStatus, TeamDelete, AgentMessage, TaskClaim, AgentSuggestion, ContextRequest
  • New trait: TurnProgressReporter — callback after each tool call in run_turn
  • Disk footprint: .clawd-agents/ directory created for mailbox, claims, suggestions

Tests run locally

  • TeamCreate spawns agents with correct roles and mode presets
  • TaskClaim: claim → list shows it → release → list empty
  • AgentMessage: send, read, broadcast work correctly
  • AgentSuggestion: pattern and pitfall proposals saved
  • TeamStatus: status, summary, inbox, kill, suggestions all functional
  • TeamDelete: deletes manifest, cleans inbox, releases claims
  • Context-window-aware auto-compaction compacts before overflow
  • /team on/off/status and Ctrl+T toggle team mode
  • Stale claims from prior runs don't block new teams
  • Claims released on failure/panic
  • Full 8-agent team (2x mode) completes with all agents reporting
  • Verify parallel tool execution timing with multiple read_file calls
  • Verify mixed read+write calls preserve ordering
  • Test ContextRequest tool with missing files/symbols
  • Test kill signal terminates stuck agent
  • Test AGENTS.md loaded into sub-agent system prompt

Known risks / non-goals

  • Risk: Parallel execution only for PARALLEL_SAFE_TOOLS (read_file, list_directory, grep_search, web_fetch, web_search, lsp, context_request) — write tools still sequential
  • Risk: Agents run in OS threads, not isolated processes — panic in one agent could crash the main process
  • Risk: Claims are file-based lock files — NFS/network filesystems may not support atomic locking
  • Risk: Auto-compaction triggers at 70% of model context — may compact mid-task if context fills quickly
  • Non-goal: No distributed agent support — all agents run locally in same process
  • Non-goal: No persistent team state across sessions — teams deleted on REPL exit
  • Non-goal: No built-in retry/resume for failed agents — humans must re-spawn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant