Skip to content

[FEATURE]: Multi-LLM structured team debate — design reference from working implementation #25766

@adndvlp

Description

@adndvlp

Feature hasn't been suggested before.

  • I have verified this feature I'm about to request hasn't been suggested before.

Describe the enhancement you want to request

Background

I built Conclave, a fork of OpenCode that
adds structured multi-LLM debate. It's been running for a few days and I wanted
to share the design decisions in case they're useful for a native implementation.

Not proposing a merge — the fork is heavily modified. Just sharing what worked
(and what didn't).

Demo: https://adndvlp.github.io/conclave/


What it adds

5 new files under packages/opencode/src/team/:

File Lines Role
debate.ts 701 Debate engine: runDebate() (flat) and runBreakingTeams() (sub-teams)
team.ts 207 Effect service that orchestrates, resolves participants, live streaming
prompts.ts 203 System prompts per phase: round 1, round 2+, sub-teams, global coordination
cli-adapter.ts 359 Adapters for Gemini CLI, Claude Code, Codex as team members
schema.ts 43 Types: TeamConfig, TeamMember, SubTeam, CrossTeamMessage

Modified OpenCode files:

File Change
session/prompt.ts If team.enabled and 2+ members, calls Team.Service.run() before normal processing
session/processor.ts If winner is a CLI participant, routes to agent-mode CLI instead of streamText
session/status.ts Adds team.breaking state with globalRound, subTeams, participantStreams
config/config.ts Adds team field to schema

How the debate works

  1. User configures a team via /team or conclave.json
  2. Every prompt automatically triggers a debate (minimum 2 rounds)
  3. Models emit structured signals in their last 5 lines:
    LEAD, SUPPORT:<name>, ALIGN, BUILD, CHALLENGE:<specific>,
    SYNTHESIZE:<X and Y>, EXTEND, PASS
  4. Parsed with regex — reliable with explicit prompt instructions, no extra LLM call needed as judge
  5. Rounds run between minRounds and maxRounds, but models can vote EXTEND
    to add rounds autonomously — debate ends when consensus is reached or
    maxRounds is hit, whichever comes first
  6. Scoring: endorsements * 2 + leads (being endorsed is worth more than self-promoting)
  7. Only the winner gets the full tool suite and implements
  8. Each model's thread is truncated to 25% of its own context window

Breaking Teams (the interesting differentiator)

Models can vote to split into sub-teams:

BREAK:<team_name>:<focus>[:<invites>]

Three phases:

  1. Decision round — models vote to break or stay flat
  2. Formation — invite resolution, solo merging, sub-team assignment
  3. Parallel internal debates + global coordination rounds — sub-teams broadcast
    dependencies to each other via BROADCAST

Result: simultaneous implementation across different files. Same-file parallel
edits would be the next step (conflict detection + merge of non-overlapping hunks).


CLI Bridging

Special provider IDs: cli-gemini, cli-claude, cli-codex

  • Auto-detected via which
  • Gemini CLI free tier: 60 req/min, 1000/day — no API key needed
  • When a CLI model wins the debate, the processor routes to agent-mode subprocess
    instead of streamText

This is the most practical unlock for users with subscriptions but no API access.


Context-aware team assignment

The system prompt includes each member's capabilities. Auto-assignment rules:

  • Large context → analysis tasks
  • Fast/cheap → execution
  • Reasoning-focused → design decisions

Prevents small-context models from attempting to read entire codebases.


Sample config

{
  "team": {
    "enabled": true,
    "members": [
      { "providerID": "deepseek", "modelID": "deepseek-chat" },
      { "providerID": "cli-gemini", "modelID": "gemini-2.5-flash" }
    ],
    "maxRounds": 3,
    "minRounds": 2,
    "breakingTeams": {
      "maxSubTeams": 3,
      "globalRoundInterval": 1
    }
  }
}

Known pain points

TUI streaming with 4+ models is the hardest part. Showing live per-model
reasoning without killing performance required a 250ms debounce on participant
text chunks — and it still gets janky. This is probably the trickiest thing to
get right if implemented natively.

Latency is additive — 3 models × 3 rounds = 9 API calls per message.
Worth it for complex tasks, overkill for simple ones. An auto-mode that skips
debate for short/simple prompts would help a lot.


Related — and why this is different

Those proposals are about orchestration: a lead assigns work to subagents
that execute in parallel.

Conclave is about deliberation: multiple LLMs debate the same task before
anyone touches the codebase. The winner implements alone.

They're complementary — debate could decide the approach, then orchestration
could execute it in parallel.


Hope this is useful reference. Happy to answer questions about any of the
implementation details.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions