feat(run): concurrent matrix execution + Docker build cleanup by Colinho22 · Pull Request #70 · Colinho22/maestro

Colinho22 · 2026-06-19T20:32:54Z

What

Two independent efficiency changes, one per commit:

Concurrent matrix execution with a per-provider concurrency cap. The
runner ran cells sequentially, so a full matrix was ~30h of mostly idle
network wait. Cells now run on a thread pool.
.dockerignore so the build context stops shipping ~1.1 GB of .venv,
.git, and tool caches to the daemon on every build.

Why

The matrix is I/O-bound: each cell is an independent LLM call (10-44s) followed
by a ~1ms scoring + DB write. Running them one at a time wastes almost all the
wall clock waiting on the network. Parallelising the cells is the single
largest speedup available and cuts an estimated ~30h run to ~5-6h, with
identical recorded numbers (cells are independent).

The Docker build had no .dockerignore, so even though the Dockerfile only
copies src/ and two metadata files, Docker tarred and shipped the whole
context, dominated by the 1.1 GB virtualenv, on every build.

How

Concurrency is built around the project's non-negotiables:

SQLite stays single-writer. Workers do only the LLM call + scoring; every
insert_* runs on the main thread, which drains finished cells via
as_completed(). No worker ever touches the DB, so there is no lock
contention and no risk of a silently dropped write.
duration_ms stays a true latency. Concurrency is capped per provider
with a semaphore (claude / gpt / mistral / gemini / deepseek), so one
provider's calls can't burst past its rate limit and inflate measured latency
with server-side queue time. The cap also keeps us clear of 429s.
Per-cell error isolation is preserved. A strategy/provider exception still
becomes a failed RunResult row rather than crashing the pool.
New flag --provider-concurrency N (default 4): a free-tier key drops it
to 1, a high-limit account raises it. It changes only speed, never results.

Testing

New tests/test_run_concurrency.py pins the semaphore wiring (one per
provider, permit count matches the flag, keys match the provider dispatch).
Full suite: 246 passed, ruff check + format clean.
Live smoke run (10 single-agent cells) and a 6-cell LangGraph/Mistral run at
concurrency 4: interleaved finish order confirms real parallelism; all rows,
sub-results, and metrics persisted correctly with 0 retries and 0 errors.

Notes

No behaviour change to the science: results are independent of execution
order, and the published default (4) is unchanged for anyone who doesn't pass
the new flag.

Summary by CodeRabbit

New Features
- Experiments now execute in parallel using thread pools for improved performance
- Added --provider-concurrency CLI option to control concurrent executions per provider
Tests
- Added concurrency enforcement tests
Chores
- Optimized Docker build context with ignore patterns

The build had no .dockerignore, so Docker shipped the entire context to the daemon on every build, including the 1.1 GB .venv, .git, and tool caches, none of which the Dockerfile copies. Exclude them so context transfer is fast and a future COPY . can never bake .venv into the image.

The matrix ran sequentially, so a full run was ~30h of mostly network wait. Execute cells on a thread pool instead. Two invariants are kept: workers do only the LLM call and scoring while the main thread remains the sole DB writer (SQLite single-writer preserved), and concurrency is capped per provider by a semaphore so one provider's calls cannot burst past its rate limit and a cell's duration_ms stays a true latency. Add --provider-concurrency N (default 4) so a free-tier key can drop to 1 and a high-limit account can raise it; it changes only speed, never the recorded numbers, since cells are independent.

coderabbitai · 2026-06-19T20:33:05Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 288ec809-0cd0-440c-b096-0d3e271f6e21

📥 Commits

Reviewing files that changed from the base of the PR and between bd6c7ec and 80aa480.

📒 Files selected for processing (3)

.dockerignore
src/maestro/run.py
tests/test_run_concurrency.py

📝 Walkthrough

Walkthrough

The runner in src/maestro/run.py is refactored from sequential per-cell execution to a ThreadPoolExecutor model with per-provider threading.Semaphore caps. A new _execute_cell() worker function handles strategy construction, semaphore-gated LLM calls, and metric evaluation; main() serializes all DB writes on the main thread. A --provider-concurrency CLI flag controls the cap. Concurrency tests are added, and a .dockerignore file is included.

Changes

Parallel Experiment Runner with Per-Provider Concurrency

Layer / File(s)	Summary
Semaphore infrastructure, CLI config, and tests `src/maestro/run.py`, `tests/test_run_concurrency.py`	Adds `threading`/`ThreadPoolExecutor` imports, `DEFAULT_PROVIDER_CONCURRENCY`, `_build_provider_semaphores()` to create one semaphore per provider dispatch needle, a `--provider-concurrency` CLI argument with `>= 1` validation, and three unit tests asserting semaphore count, permit exhaustion behavior, and dispatch key coverage.
`_execute_cell` worker function `src/maestro/run.py`	Adds `_execute_cell()` which builds the control or LLM strategy for a matrix cell, acquires the per-provider semaphore around the LLM network call, returns a failed `RunResult` on any exception, and evaluates metrics on success with errors caught and logged.
`main()` thread pool orchestration `src/maestro/run.py`	Refactors `main()` to filter out unimplemented strategies before scheduling, build provider semaphores, create a `ThreadPoolExecutor` sized by `len(_PROVIDER_DISPATCH) × provider_concurrency`, submit all runnable cells as futures, and process completions on the main thread for all DB writes (config, result, sub-results, metrics), with final counts reported against `total_runnable`.

Docker Build Context

Layer / File(s)	Summary
`.dockerignore` patterns `.dockerignore`	Adds patterns excluding Python virtual environments, build/cache outputs, tool caches, VCS/IDE metadata, local secrets, Claude Code files, OS cruft, and logs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 Hoppity-hop, no more waiting in line,
Each provider gets semaphores — oh, so fine!
The thread pool spins up, cells fly through the air,
The main thread writes DB with the greatest of care.
.dockerignore trims the context down slim,
A parallel meadow, built right on a whim! 🌿

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately summarizes the two main changes: concurrent matrix execution and Docker build cleanup via .dockerignore, matching the substantial refactoring in run.py and new .dockerignore file.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Docstring Coverage (Src Only)	✅ Passed	All 4 public functions in src/maestro/run.py have docstrings (100% coverage exceeds 80% threshold). .dockerignore is not source code. Tests are excluded.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat-concurrent-llm-calls

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Colinho22 added 2 commits June 19, 2026 22:28

Colinho22 added this to the 🧪 Experimental Artifact milestone Jun 19, 2026

Colinho22 self-assigned this Jun 19, 2026

Colinho22 added the enhancement New feature or request label Jun 19, 2026

Colinho22 merged commit e3ffba1 into main Jun 19, 2026
2 checks passed

Colinho22 deleted the feat-concurrent-llm-calls branch June 19, 2026 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(run): concurrent matrix execution + Docker build cleanup#70

feat(run): concurrent matrix execution + Docker build cleanup#70
Colinho22 merged 2 commits into
mainfrom
feat-concurrent-llm-calls

Colinho22 commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Colinho22 commented Jun 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Testing

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Colinho22 commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading