feat(run): concurrent matrix execution + Docker build cleanup#70
Conversation
The build had no .dockerignore, so Docker shipped the entire context to the daemon on every build, including the 1.1 GB .venv, .git, and tool caches, none of which the Dockerfile copies. Exclude them so context transfer is fast and a future COPY . can never bake .venv into the image.
The matrix ran sequentially, so a full run was ~30h of mostly network wait. Execute cells on a thread pool instead. Two invariants are kept: workers do only the LLM call and scoring while the main thread remains the sole DB writer (SQLite single-writer preserved), and concurrency is capped per provider by a semaphore so one provider's calls cannot burst past its rate limit and a cell's duration_ms stays a true latency. Add --provider-concurrency N (default 4) so a free-tier key can drop to 1 and a high-limit account can raise it; it changes only speed, never the recorded numbers, since cells are independent.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThe runner in ChangesParallel Experiment Runner with Per-Provider Concurrency
Docker Build Context
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
What
Two independent efficiency changes, one per commit:
runner ran cells sequentially, so a full matrix was ~30h of mostly idle
network wait. Cells now run on a thread pool.
.dockerignoreso the build context stops shipping ~1.1 GB of.venv,.git, and tool caches to the daemon on every build.Why
The matrix is I/O-bound: each cell is an independent LLM call (10-44s) followed
by a ~1ms scoring + DB write. Running them one at a time wastes almost all the
wall clock waiting on the network. Parallelising the cells is the single
largest speedup available and cuts an estimated ~30h run to ~5-6h, with
identical recorded numbers (cells are independent).
The Docker build had no
.dockerignore, so even though the Dockerfile onlycopies
src/and two metadata files, Docker tarred and shipped the wholecontext, dominated by the 1.1 GB virtualenv, on every build.
How
Concurrency is built around the project's non-negotiables:
insert_*runs on the main thread, which drains finished cells viaas_completed(). No worker ever touches the DB, so there is no lockcontention and no risk of a silently dropped write.
duration_msstays a true latency. Concurrency is capped per providerwith a semaphore (claude / gpt / mistral / gemini / deepseek), so one
provider's calls can't burst past its rate limit and inflate measured latency
with server-side queue time. The cap also keeps us clear of 429s.
becomes a failed
RunResultrow rather than crashing the pool.--provider-concurrency N(default 4): a free-tier key drops itto 1, a high-limit account raises it. It changes only speed, never results.
Testing
tests/test_run_concurrency.pypins the semaphore wiring (one perprovider, permit count matches the flag, keys match the provider dispatch).
concurrency 4: interleaved finish order confirms real parallelism; all rows,
sub-results, and metrics persisted correctly with 0 retries and 0 errors.
Notes
order, and the published default (4) is unchanged for anyone who doesn't pass
the new flag.
Summary by CodeRabbit
New Features
--provider-concurrencyCLI option to control concurrent executions per providerTests
Chores