Skip to content

Add osprey-stress CLI (closes #324)#367

Open
julietshen wants to merge 6 commits into
mainfrom
add-stress-cli
Open

Add osprey-stress CLI (closes #324)#367
julietshen wants to merge 6 commits into
mainfrom
add-stress-cli

Conversation

@julietshen

@julietshen julietshen commented Jun 17, 2026

Copy link
Copy Markdown
Member

Summary

Wires the stress reporter (#328), producer (#329), and consumer (#330) into a CLI that produces synthetic events at a configurable rate, observes their ExecutionResults, and reports drop rate + p50/p95/p99 latency. Exits non-zero on threshold breach so it can gate CI on pipeline health.

osprey-stress run \
  --events 10000 --rate 1000 \
  --threshold-drop-rate 0.01 --threshold-p95-ms 500 \
  --report json

Closes #324.

What's in this PR

osprey_worker/src/osprey/worker/stress/cli.py argparse + orchestration + ProgressReporter for --verbose runs + measure stub
osprey_worker/src/osprey/worker/stress/tests/test_cli.py 26 tests: arg parsing, topic-head probes, ProgressReporter behavior under failure
osprey_worker/pyproject.toml adds osprey-stress = "osprey.worker.stress.cli:main" console script
osprey_ui/src/components/event_stream/EventStream.module.css forces a visible scrollbar on the virtualized list (surfaced during dogfooding)
CHANGELOG.md new-features entry citing the four parent PRs

Test plan

  • pytest osprey_worker/src/osprey/worker/stress/tests/test_cli.py — 26/26 pass locally in the docker test runner
  • uv run ruff check osprey_worker/src/osprey/worker/stress/ — clean
  • uv run mypy osprey_worker/src/osprey/worker/stress/ — clean
  • Threaded ProgressReporter tests rewritten with deadline-based polling instead of fixed time.sleep() windows to avoid CI flakes
  • CI green (pending push)

Review feedback addressed

  • CodeRabbit + @chimosky on probe close: bare except/pass around consumer.close(autocommit=False) and inside ProgressReporter._run() now log at debug with exc_info=True so repeated transient failures are diagnosable.
  • CodeRabbit on ProgressReporter.start(): the initial input/output probes are now guarded the same way _run() is — a transient broker failure falls back to a zero baseline rather than aborting cmd_run after the consumer has already started.
  • CodeRabbit on test flakiness: the two threaded ProgressReporter tests no longer sleep on fixed wall-clock windows; they use a _wait_until(predicate, deadline=5s) helper that polls for the specific output state under test.
  • CodeRabbit on CSS Modules :global(...): no stylelint configured in osprey_ui (only ESLint + Prettier), so there's no warning to suppress; Prettier is happy.

Stack reminder

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a new stress-testing CLI that can generate synthetic events at a configurable rate, track delivery progress, report latency percentiles and drop rate, and return a non-zero exit code when thresholds are exceeded.
    • Added optional verbose progress updates during longer runs.
  • Bug Fixes

    • Improved the event stream list so the scrollbar is always visible, making scrolling more discoverable and consistent across systems.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds the osprey-stress CLI with a run subcommand that probes Kafka topic heads, runs a filtered consumer and producer flow, reports latency and drop metrics, and exits non-zero on threshold failure. The UI event-stream list also now forces a visible scrollbar.

Changes

osprey-stress CLI

Layer / File(s) Summary
CLI contracts and parser
osprey_worker/src/osprey/worker/stress/cli.py
Defines the stress CLI constants, topic snapshot type, topic probing helper, progress reporter, argument validators, Kafka and threshold option wiring, parser subcommands, and bootstrap server parsing.
Run command and dispatch
osprey_worker/src/osprey/worker/stress/cli.py
Implements cmd_run orchestration with pre/post topic-head probing, consumer start and drain, optional progress reporting, report computation and threshold-based exit codes, keeps cmd_measure as a stub, updates main() dispatch, and adds the module entry guard.
Tests, script entry point, and changelog
osprey_worker/src/osprey/worker/stress/tests/test_cli.py, osprey_worker/pyproject.toml, CHANGELOG.md
Adds parser and helper tests for the stress CLI, registers the osprey-stress console script, and documents the new CLI in the changelog.

Event-stream scrollbar

Layer / File(s) Summary
Scrollbar styling
osprey_ui/src/components/event_stream/EventStream.module.css, CHANGELOG.md
Adds Firefox and WebKit scrollbar styling for the event-stream virtualized list and records the visible-scrollbar behavior in the changelog.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The CLI covers installability, configurable rate/duration, thresholds, and latency reporting, but it does not show the full per-stage drop breakdown requested by #324. Add burst-size support and per-stage accounting for ack, drain, effects, and storage, then emit the requested structured rate-of-success report.
Out of Scope Changes check ⚠️ Warning The PR includes an unrelated UI scrollbar visibility fix in osprey_ui, which is outside the stress-CLI scope. Split the scrollbar/CSS change into a separate PR unless it is explicitly required for the stress-CLI work.
Docstring Coverage ⚠️ Warning Docstring coverage is 5.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is concise and accurately describes the main addition, the new osprey-stress CLI.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add-stress-cli

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@julietshen julietshen marked this pull request as ready for review June 17, 2026 13:58
@julietshen

Copy link
Copy Markdown
Member Author

#330 merged, rebased on main, dropped the import shim (try/except ImportError + # type: ignore[import-untyped] + TestRunBlocked). Net -32 lines.

  • Consumer / ConsumerConfig import is now at module level alongside Producer and Report.
  • cmd_run is fully wired end-to-end and unblocked.
  • osprey-stress --help works after uv sync --dev.
  • 6/6 cli tests pass, ruff + mypy clean.
  • Marked ready-for-review.

Closes #324.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
osprey_worker/src/osprey/worker/stress/tests/test_cli.py (1)

10-72: ⚡ Quick win

Add parser tests for invalid numeric inputs

Please add cases asserting parse failure for invalid run args (--rate 0, --rate -1, --events 0, --drain-seconds -1). This will lock in the non-crashing contract for cmd_run.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@osprey_worker/src/osprey/worker/stress/tests/test_cli.py` around lines 10 -
72, Add new test methods to the TestParser class to verify that the parser
rejects invalid numeric inputs and raises SystemExit. Create test methods for
each invalid case: rate equal to 0, rate as negative value, events equal to 0,
and drain-seconds as negative value. Each test should follow the pattern used in
test_no_command_requires_one by wrapping build_parser().parse_args() with
pytest.raises(SystemExit) and passing the run command with the respective
invalid argument to ensure the parser properly validates these numeric
constraints.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@osprey_worker/src/osprey/worker/stress/cli.py`:
- Around line 90-97: The CLI arguments `--events`, `--rate`, and
`--drain-seconds` being added with add_argument do not validate that their
values are positive, which causes a ZeroDivisionError when args.rate equals zero
at line 139. Add validation constraints to each argument definition to enforce
that events must be greater than 0, rate must be greater than 0, and
drain-seconds must be greater than or equal to 0. Use the appropriate argument
parameter (such as choices or a custom action/type) to validate these
constraints at parse time rather than at runtime.

---

Nitpick comments:
In `@osprey_worker/src/osprey/worker/stress/tests/test_cli.py`:
- Around line 10-72: Add new test methods to the TestParser class to verify that
the parser rejects invalid numeric inputs and raises SystemExit. Create test
methods for each invalid case: rate equal to 0, rate as negative value, events
equal to 0, and drain-seconds as negative value. Each test should follow the
pattern used in test_no_command_requires_one by wrapping
build_parser().parse_args() with pytest.raises(SystemExit) and passing the run
command with the respective invalid argument to ensure the parser properly
validates these numeric constraints.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 22bb7c6d-c7f0-4ca8-bece-83d660f341fd

📥 Commits

Reviewing files that changed from the base of the PR and between 6eba08e and f98dd4c.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • osprey_worker/pyproject.toml
  • osprey_worker/src/osprey/worker/stress/cli.py
  • osprey_worker/src/osprey/worker/stress/tests/test_cli.py

Comment thread osprey_worker/src/osprey/worker/stress/cli.py Outdated
julietshen added a commit that referenced this pull request Jun 17, 2026
Addresses CodeRabbit feedback on #367. Previously the parser accepted 0
or negatives for the numeric args; a `--rate 0` would then ZeroDivision
when computing `max_runtime_seconds = events / rate + drain_seconds`,
which is a worse failure mode than a clear argparse error.

Adds three small type-validator helpers (`_positive_int`,
`_positive_float`, `_nonneg_float`) that raise `ArgumentTypeError` at
parse time, plus a parametrized test covering the seven invalid-input
combinations (events ≤ 0, rate ≤ 0, drain-seconds < 0, duration ≤ 0).
@julietshen

Copy link
Copy Markdown
Member Author

Addressed CodeRabbit's two notes in ffca77c:

Validation (cli.py:97 / 139) — added _positive_int, _positive_float, _nonneg_float helpers and wired them as the type= for --events, --rate, --drain-seconds, and the measure subcommand's --duration. They raise ArgumentTypeError at parse time, so users get a clean argparse error message instead of a ZeroDivisionError later in max_runtime_seconds = events / rate + drain_seconds.

Parser tests (test_cli.py:10-72) — added a parametrized test_rejects_non_positive_numeric_args covering the seven invalid-input combinations (events ≤ 0, rate ≤ 0, drain-seconds < 0, duration ≤ 0). Each asserts SystemExit (argparse's behavior on type-validation failure).

13/13 cli tests pass (was 6), ruff + mypy clean.

julietshen added a commit that referenced this pull request Jun 17, 2026
Addresses CodeRabbit feedback on #367. Previously the parser accepted 0
or negatives for the numeric args; a `--rate 0` would then ZeroDivision
when computing `max_runtime_seconds = events / rate + drain_seconds`,
which is a worse failure mode than a clear argparse error.

Adds three small type-validator helpers (`_positive_int`,
`_positive_float`, `_nonneg_float`) that raise `ArgumentTypeError` at
parse time, plus a parametrized test covering the seven invalid-input
combinations (events ≤ 0, rate ≤ 0, drain-seconds < 0, duration ≤ 0).
@chimosky

Copy link
Copy Markdown
Contributor

Tested 993cde3, performance seems to degrade with each run.

osprey on  asc [$!?] via  v3.11.11 (osprey) 
 ➜   osprey-stress run --events 10000 --rate 1000 --threshold-drop-rate 0.01 --threshold-p95-ms 500 --report json
[osprey-stress] run_id=af099ace events=10000 rate=1000/s bootstrap=localhost:9092 input=osprey.actions_input output=osprey.execution_results
[osprey-stress] producer done, draining consumer...
{
  "consumed": 10000,
  "drop_count": 0,
  "drop_rate": 0.0,
  "duration_seconds": 120.37888309000004,
  "latency": {
    "count": 10000,
    "max_ms": 109213.1917476654,
    "min_ms": 88.16361427307129,
    "p50_ms": 57644.95253562927,
    "p95_ms": 103032.96446800232,
    "p99_ms": 108514.60933685303
  },
  "matched": 10000,
  "mode": "closed-loop",
  "produced": 10000,
  "threshold_breaches": [
    "p95_ms 103033.0 > 500.0"
  ],
  "thresholds": {
    "drop_rate": 0.01,
    "p95_ms": 500.0
  }
}

osprey on  asc [$!?] via  v3.11.11 (osprey) took 2m2s 
✗ osprey-stress run --events 10000 --rate 1000 --threshold-drop-rate 0.01 --threshold-p95-ms 500 --report json
[osprey-stress] run_id=d54bcaa9 events=10000 rate=1000/s bootstrap=localhost:9092 input=osprey.actions_input output=osprey.execution_results
[osprey-stress] producer done, draining consumer...
{
  "consumed": 9096,
  "drop_count": 904,
  "drop_rate": 0.0904,
  "duration_seconds": 131.1068548410001,
  "latency": {
    "count": 9096,
    "max_ms": 120701.101064682,
    "min_ms": 65.55795669555664,
    "p50_ms": 62529.24847602844,
    "p95_ms": 112622.1079826355,
    "p99_ms": 117529.26683425903
  },
  "matched": 9096,
  "mode": "closed-loop",
  "produced": 10000,
  "threshold_breaches": [
    "drop_rate 0.0904 > 0.0100",
    "p95_ms 112622.1 > 500.0"
  ],
  "thresholds": {
    "drop_rate": 0.01,
    "p95_ms": 500.0
  }
}

osprey on  asc [$!?] via  v3.11.11 (osprey) took 2m13s 
✗ osprey-stress run --events 10000 --rate 1000 --threshold-drop-rate 0.01 --threshold-p95-ms 500 --report json
[osprey-stress] run_id=8bb804f6 events=10000 rate=1000/s bootstrap=localhost:9092 input=osprey.actions_input output=osprey.execution_results
[osprey-stress] producer done, draining consumer...
{
  "consumed": 5829,
  "drop_count": 4171,
  "drop_rate": 0.4171,
  "duration_seconds": 89.49104550900006,
  "latency": {
    "count": 5829,
    "max_ms": 81125.71907043457,
    "min_ms": 62.23154067993164,
    "p50_ms": 37650.094509124756,
    "p95_ms": 77414.13927078247,
    "p99_ms": 80067.45100021362
  },
  "matched": 5829,
  "mode": "closed-loop",
  "produced": 10000,
  "threshold_breaches": [
    "drop_rate 0.4171 > 0.0100",
    "p95_ms 77414.1 > 500.0"
  ],
  "thresholds": {
    "drop_rate": 0.01,
    "p95_ms": 500.0
  }
}

osprey on  asc [$!?] via  v3.11.11 (osprey) took 1m31s 
✗ osprey-stress run --events 10000 --rate 1000 --threshold-drop-rate 0.01 --threshold-p95-ms 500 --report json
[osprey-stress] run_id=879e6925 events=10000 rate=1000/s bootstrap=localhost:9092 input=osprey.actions_input output=osprey.execution_results
[osprey-stress] producer done, draining consumer...
{
  "consumed": 3395,
  "drop_count": 6605,
  "drop_rate": 0.6605,
  "duration_seconds": 58.78333274199986,
  "latency": {
    "count": 3395,
    "max_ms": 49045.26400566101,
    "min_ms": 63.27676773071289,
    "p50_ms": 24665.666580200195,
    "p95_ms": 46235.65363883972,
    "p99_ms": 47933.637857437134
  },
  "matched": 3395,
  "mode": "closed-loop",
  "produced": 10000,
  "threshold_breaches": [
    "drop_rate 0.6605 > 0.0100",
    "p95_ms 46235.7 > 500.0"
  ],
  "thresholds": {
    "drop_rate": 0.01,
    "p95_ms": 500.0
  }
}

I also noticed the scrollbar in the event stream section is barely visible, which makes it impossible to move through events a lot faster.

I'm wondering if a verbose output for this might be a good option, rather than having to look at docker compose logs for debug output.

@julietshen

Copy link
Copy Markdown
Member Author

Thank you so much @chimosky for testing! fixes pushed in 16a0659 and 29e03ad.

(claude assisted comment):
for performance degradation, the harness was reporting the truth about a saturated pipeline, but the output didn't make that clear, so it read like a harness regression. Reading your numbers, p95 latencies are tens of seconds and the worker is processing ~83 events/sec vs the 1000/sec we're asking for; each run leaves ~7-9k events backed up in the inputz topic that the next run inherits, so things compound.

  • added a pre-run baseline + post-run delta on the input/output topic heads. If output_topic advanced by fewer events than we sent, the CLI now logs a warning that the next run will start with backlog.
  • added a --verbose flag that emits periodic stderr lines: t=Xs produced=Y/N matched=M match_rate=R/s in_topic_Δ=A out_topic_Δ=B. The new ProgressReporter is decoupled from Producer/Consumer and factory-injected so it's unit-testable without a Kafka cluster.

also fixed the scrollbar thing

julietshen added a commit that referenced this pull request Jun 24, 2026
Addresses chimosky's review on #367, where runs got progressively worse
across repeated invocations (0% → 9% → 41% → 66% drop rate over 4 runs).
The harness was reporting the truth — pipeline saturated, each run leaves
backlog the next inherits — but the framing made it look like a harness
regression rather than the SUT one.

- `probe_topic_head` snapshots an input/output topic's end-offsets via a
  groupless KafkaConsumer. Tolerant of missing topics (fresh clusters).
- Before producing, log baseline heads. After the run, log the deltas;
  if output_topic advanced by fewer than `--events`, warn that backlog
  will carry into the next run.
- `--verbose` (+ `--verbose-interval-seconds`) emits live stderr lines:
  produced/target, matched, match_rate, in/out topic deltas. Avoids
  having to tail `docker compose logs` to know what's happening.
- 13 new unit tests cover the verbose-flag parsing, topic-head probing
  edge cases (missing topic, close failures, partition assignment),
  and the ProgressReporter (periodic emission, double-start guard,
  probe-failure tolerance).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
julietshen added a commit that referenced this pull request Jun 24, 2026
Also from chimosky's review on #367: the event-stream scrollbar was
"barely visible," which made it slow to skim long result sets. We were
relying on the OS auto-hide default (essentially invisible on macOS
unless you're actively scrolling).

Style the `ReactVirtualized__List` scrolling div with a thin
always-visible bar in the existing `--text-light-secondary` /
`--background-secondary` theme variables, plus a hover state. WebKit
(Chrome/Safari) and Firefox both honored.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
osprey_worker/src/osprey/worker/stress/tests/test_cli.py (1)

188-190: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Make threaded progress tests less timing-sensitive.

These checks depend on fixed sleep windows, which can intermittently flake on busy CI runners. Prefer deadline-based polling for the expected condition (e.g., emitted line present).

Also applies to: 239-240

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@osprey_worker/src/osprey/worker/stress/tests/test_cli.py` around lines 188 -
190, The threaded progress assertions in the stress CLI tests are too dependent
on fixed sleep timing and can flake under load. Update the affected test flow
around tick-based progress checks to use deadline-based polling for the expected
emitted line or condition instead of sleeping for hardcoded intervals, and apply
the same change to the other marked section in the test file. Use the existing
test helpers and the relevant progress-output assertions to locate the impacted
logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@osprey_ui/src/components/event_stream/EventStream.module.css`:
- Around line 25-45: The scrollbar styling block in EventStream.module.css uses
CSS Modules :global(...) selectors that Stylelint flags as unknown, so update
the lint setup to understand CSS Modules selectors for this file or add a
narrowly scoped suppression with justification. Use the virtualizedList and
ReactVirtualized__List selector block as the target when applying the fix, and
prefer a config/plugin change over silencing the rule globally.

In `@osprey_worker/src/osprey/worker/stress/cli.py`:
- Around line 86-91: The probe/close exception handlers in the stress CLI are
swallowing failures without any trace, making repeated best-effort probe issues
impossible to diagnose. In the affected try/except blocks around
consumer.close(autocommit=False) in the CLI flow, keep the stress run alive but
replace the bare except/pass with debug logging that records the exception
details and context, then continue. Apply the same fix in both matching handlers
so the behavior is consistent and discoverable without becoming noisy.
- Around line 131-135: Make the initial reporter probes in start() best-effort
like _run() so a transient Kafka failure in --verbose mode does not abort
cmd_run after the consumer has already started. Update the startup path around
_probe_input() and _probe_output() in StressProgressReporter to catch probe
errors and fall back to zero baselines, matching the existing guarded behavior
in _run().

---

Nitpick comments:
In `@osprey_worker/src/osprey/worker/stress/tests/test_cli.py`:
- Around line 188-190: The threaded progress assertions in the stress CLI tests
are too dependent on fixed sleep timing and can flake under load. Update the
affected test flow around tick-based progress checks to use deadline-based
polling for the expected emitted line or condition instead of sleeping for
hardcoded intervals, and apply the same change to the other marked section in
the test file. Use the existing test helpers and the relevant progress-output
assertions to locate the impacted logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7371df68-ba7d-4561-865e-b69d9cfbab75

📥 Commits

Reviewing files that changed from the base of the PR and between 993cde3 and 29e03ad.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • osprey_ui/src/components/event_stream/EventStream.module.css
  • osprey_worker/src/osprey/worker/stress/cli.py
  • osprey_worker/src/osprey/worker/stress/tests/test_cli.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

Comment on lines +25 to +45
.virtualizedList :global(.ReactVirtualized__List) {
scrollbar-width: thin;
scrollbar-color: var(--text-light-secondary) var(--background-secondary);
}

.virtualizedList :global(.ReactVirtualized__List)::-webkit-scrollbar {
width: 10px;
height: 10px;
}

.virtualizedList :global(.ReactVirtualized__List)::-webkit-scrollbar-track {
background: var(--background-secondary);
}

.virtualizedList :global(.ReactVirtualized__List)::-webkit-scrollbar-thumb {
background-color: var(--text-light-secondary);
border-radius: 5px;
border: 2px solid var(--background-secondary);
}

.virtualizedList :global(.ReactVirtualized__List)::-webkit-scrollbar-thumb:hover {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Stylelint currently rejects :global(...) selectors used in this block.

selector-pseudo-class-no-unknown is raised on each :global(...) selector here, so this likely fails CI unless stylelint is configured for CSS Modules syntax. Please add the appropriate stylelint config exception/plugin (preferred) or narrowly-scoped lint suppression with justification.

🧰 Tools
🪛 Stylelint (17.13.0)

[error] 25-25: Unknown pseudo-class selector ":global" (selector-pseudo-class-no-unknown)

(selector-pseudo-class-no-unknown)


[error] 30-30: Unknown pseudo-class selector ":global" (selector-pseudo-class-no-unknown)

(selector-pseudo-class-no-unknown)


[error] 35-35: Unknown pseudo-class selector ":global" (selector-pseudo-class-no-unknown)

(selector-pseudo-class-no-unknown)


[error] 39-39: Unknown pseudo-class selector ":global" (selector-pseudo-class-no-unknown)

(selector-pseudo-class-no-unknown)


[error] 45-45: Unknown pseudo-class selector ":global" (selector-pseudo-class-no-unknown)

(selector-pseudo-class-no-unknown)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@osprey_ui/src/components/event_stream/EventStream.module.css` around lines 25
- 45, The scrollbar styling block in EventStream.module.css uses CSS Modules
:global(...) selectors that Stylelint flags as unknown, so update the lint setup
to understand CSS Modules selectors for this file or add a narrowly scoped
suppression with justification. Use the virtualizedList and
ReactVirtualized__List selector block as the target when applying the fix, and
prefer a config/plugin change over silencing the rule globally.

Source: Linters/SAST tools

Comment on lines +86 to +91
try:
consumer.close(autocommit=False)
except Exception:
# Probe is best-effort. A noisy close() during teardown would mask
# the snapshot value the caller needs.
pass

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Log swallowed probe exceptions.

Both handlers intentionally keep the stress run alive, but they currently suppress failures with no trace. Use debug logging so repeated probe/close failures are diagnosable without making best-effort probing noisy.

Proposed fix
+import logging
 import threading
 import time
@@
 from kafka import KafkaConsumer
 from kafka.structs import TopicPartition
 
+logger = logging.getLogger(__name__)
+
@@
         except Exception:
             # Probe is best-effort. A noisy close() during teardown would mask
             # the snapshot value the caller needs.
-            pass
+            logger.debug('failed to close stress probe consumer', exc_info=True)
@@
             except Exception:
                 # A failed probe (broker hiccup, transient timeout) shouldn't
                 # tear down the in-flight stress run. Skip the tick.
-                pass
+                logger.debug('failed to emit stress progress tick', exc_info=True)

As per coding guidelines, “Do not silently swallow exceptions with except Exception: pass without logging or rethrowing.”

Also applies to: 147-152

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@osprey_worker/src/osprey/worker/stress/cli.py` around lines 86 - 91, The
probe/close exception handlers in the stress CLI are swallowing failures without
any trace, making repeated best-effort probe issues impossible to diagnose. In
the affected try/except blocks around consumer.close(autocommit=False) in the
CLI flow, keep the stress run alive but replace the bare except/pass with debug
logging that records the exception details and context, then continue. Apply the
same fix in both matching handlers so the behavior is consistent and
discoverable without becoming noisy.

Source: Coding guidelines

Comment on lines +131 to +135
# Probe baselines on the calling thread so the first periodic emit can
# diff against them without racing the worker's startup.
self._initial_input = self._probe_input()
self._initial_output = self._probe_output()
self._thread = threading.Thread(target=self._run, name='stress-progress', daemon=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Make reporter baseline probes best-effort too.

start() runs the initial probes before the background thread exists, so a transient Kafka probe failure in --verbose mode can abort cmd_run after the consumer has already started. Match _run()’s best-effort behavior and fall back to zero baselines.

Proposed fix
         self._start_monotonic = time.monotonic()
         # Probe baselines on the calling thread so the first periodic emit can
         # diff against them without racing the worker's startup.
-        self._initial_input = self._probe_input()
-        self._initial_output = self._probe_output()
+        try:
+            self._initial_input = self._probe_input()
+            self._initial_output = self._probe_output()
+        except Exception:
+            logger.debug('failed to capture stress progress baselines', exc_info=True)
+            self._initial_input = 0
+            self._initial_output = 0
         self._thread = threading.Thread(target=self._run, name='stress-progress', daemon=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Probe baselines on the calling thread so the first periodic emit can
# diff against them without racing the worker's startup.
self._initial_input = self._probe_input()
self._initial_output = self._probe_output()
self._thread = threading.Thread(target=self._run, name='stress-progress', daemon=True)
# Probe baselines on the calling thread so the first periodic emit can
# diff against them without racing the worker's startup.
try:
self._initial_input = self._probe_input()
self._initial_output = self._probe_output()
except Exception:
logger.debug('failed to capture stress progress baselines', exc_info=True)
self._initial_input = 0
self._initial_output = 0
self._thread = threading.Thread(target=self._run, name='stress-progress', daemon=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@osprey_worker/src/osprey/worker/stress/cli.py` around lines 131 - 135, Make
the initial reporter probes in start() best-effort like _run() so a transient
Kafka failure in --verbose mode does not abort cmd_run after the consumer has
already started. Update the startup path around _probe_input() and
_probe_output() in StressProgressReporter to catch probe errors and fall back to
zero baselines, matching the existing guarded behavior in _run().

Comment on lines +253 to +257
run.add_argument(
'--verbose',
action='store_true',
help='Emit periodic progress lines to stderr (live throughput, topic deltas).',
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be great as an argument to the main parser, and not just the run sub parser.

So verbose can be applied to both commands.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed your feedback in my latest commits! thank you so much for the review!

@chimosky

Copy link
Copy Markdown
Contributor

Reviewed 29e03ad, it'll be great to log errors and exceptions rather than pass on them.

julietshen and others added 6 commits June 30, 2026 11:42
Orchestrates the stress reporter (#328), producer (#329), and consumer
(#330) into a CLI that produces synthetic events at a configurable rate,
observes their ExecutionResults, and reports drop rate + p50/p95/p99
latency. Exits non-zero on threshold breach so it can gate CI on pipeline
health.

Subcommands:
* `run` — closed-loop synthetic testing. Blocked on #330; the body
  lazy-imports the consumer and short-circuits with a clear pointer
  message until that PR merges, then activates.
* `measure` — open-loop measurement against an external source. Stub;
  will activate once #236 (jetstream input stream plugin) lands.

Argparse + threshold gates + report dispatch + entry-point registration
are all live today, so the CLI structure (~240 lines) is reviewable
independently. When #330 merges, the only change needed here is dropping
the `try/except ImportError` wrapper around the consumer import — the
orchestration body below it already references Consumer/ConsumerConfig.

Adds:
* `osprey-stress` console script in `osprey_worker/pyproject.toml`
* CHANGELOG entry citing the full stack of PRs the harness landed across
* 7 unit tests covering arg parsing, the measure stub, and the
  #330-blocked-on-import path so a regression in either is loud
Promotes the consumer import to module level alongside producer and
reporter, removes the try/except ImportError stub in cmd_run, the
matching `# type: ignore[import-untyped]`, and the
`TestRunBlocked` test that pinned the stub behaviour. The rest of the
CLI (arg parsing, threshold gates, report dispatch, measure stub)
already exercised these code paths.
Addresses CodeRabbit feedback on #367. Previously the parser accepted 0
or negatives for the numeric args; a `--rate 0` would then ZeroDivision
when computing `max_runtime_seconds = events / rate + drain_seconds`,
which is a worse failure mode than a clear argparse error.

Adds three small type-validator helpers (`_positive_int`,
`_positive_float`, `_nonneg_float`) that raise `ArgumentTypeError` at
parse time, plus a parametrized test covering the seven invalid-input
combinations (events ≤ 0, rate ≤ 0, drain-seconds < 0, duration ≤ 0).
Addresses chimosky's review on #367, where runs got progressively worse
across repeated invocations (0% → 9% → 41% → 66% drop rate over 4 runs).
The harness was reporting the truth — pipeline saturated, each run leaves
backlog the next inherits — but the framing made it look like a harness
regression rather than the SUT one.

- `probe_topic_head` snapshots an input/output topic's end-offsets via a
  groupless KafkaConsumer. Tolerant of missing topics (fresh clusters).
- Before producing, log baseline heads. After the run, log the deltas;
  if output_topic advanced by fewer than `--events`, warn that backlog
  will carry into the next run.
- `--verbose` (+ `--verbose-interval-seconds`) emits live stderr lines:
  produced/target, matched, match_rate, in/out topic deltas. Avoids
  having to tail `docker compose logs` to know what's happening.
- 13 new unit tests cover the verbose-flag parsing, topic-head probing
  edge cases (missing topic, close failures, partition assignment),
  and the ProgressReporter (periodic emission, double-start guard,
  probe-failure tolerance).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Also from chimosky's review on #367: the event-stream scrollbar was
"barely visible," which made it slow to skim long result sets. We were
relying on the OS auto-hide default (essentially invisible on macOS
unless you're actively scrolling).

Style the `ReactVirtualized__List` scrolling div with a thin
always-visible bar in the existing `--text-light-secondary` /
`--background-secondary` theme variables, plus a hover state. WebKit
(Chrome/Safari) and Firefox both honored.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
CodeRabbit + chimosky review on #367:

- The bare except/pass around `consumer.close(autocommit=False)` and
  inside `ProgressReporter._run()` hid every transient broker hiccup.
  Replace with `logger.debug(..., exc_info=True)` so repeated failures
  are diagnosable without becoming user-visible noise.

- `ProgressReporter.start()` called `_probe_input()` and
  `_probe_output()` directly on the calling thread; a transient probe
  failure there aborted `cmd_run` even though the consumer was already
  running. Wrap each in try/except with a 0 baseline fallback to match
  the guarded behavior `_run()` already had.

- The two threaded ProgressReporter tests used fixed time.sleep()
  windows to wait for tick output, which flakes on busy CI runners.
  Replace with a `_wait_until(predicate, deadline=5s)` helper that
  polls the sink and waits for the specific state the test cares
  about (in_topic_Δ value, non-empty sink) before stopping.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
osprey_ui/src/components/event_stream/EventStream.module.css (1)

21-48: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Stylelint still flags valid CSS Modules :global(...) selectors.

This file uses the .module.css extension, confirming it is a CSS Module. The :global(...) pseudo-class is standard CSS Modules syntax for targeting external class names like ReactVirtualized__List, but Stylelint's selector-pseudo-class-no-unknown rule does not recognize it without additional configuration. This will break CI if stylelint runs in the pipeline.

Please add stylelint-config-css-modules (or equivalent) to the project's stylelint configuration, or add a narrowly-scoped disable comment with justification if a project-wide fix is not viable right now.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@osprey_ui/src/components/event_stream/EventStream.module.css` around lines 21
- 48, Stylelint is incorrectly flagging the CSS Modules :global(...) selectors
used in EventStream.module.css. Update the project’s Stylelint setup to
recognize CSS Modules syntax by adding stylelint-config-css-modules (or
equivalent) to the configuration used for selector-pseudo-class-no-unknown, so
the ReactVirtualized__List rules in this module pass CI. If a global config
change is not feasible, add a narrowly scoped disable comment around the
affected selectors with a clear justification.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@osprey_ui/src/components/event_stream/EventStream.module.css`:
- Around line 21-48: Stylelint is incorrectly flagging the CSS Modules
:global(...) selectors used in EventStream.module.css. Update the project’s
Stylelint setup to recognize CSS Modules syntax by adding
stylelint-config-css-modules (or equivalent) to the configuration used for
selector-pseudo-class-no-unknown, so the ReactVirtualized__List rules in this
module pass CI. If a global config change is not feasible, add a narrowly scoped
disable comment around the affected selectors with a clear justification.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 656ad55b-2a08-44a1-b00f-d85c35cbd7f7

📥 Commits

Reviewing files that changed from the base of the PR and between 29e03ad and 4b9ea61.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • osprey_ui/src/components/event_stream/EventStream.module.css
  • osprey_worker/pyproject.toml
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • osprey_worker/pyproject.toml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build a stress-test CLI for input sinks + rate-of-success measurement

2 participants