Skip to content

fix: resolve all 35 issues from adversarial code review#7

Merged
ejwhite7 merged 19 commits into
mainfrom
fix/review-remediation
Apr 30, 2026
Merged

fix: resolve all 35 issues from adversarial code review#7
ejwhite7 merged 19 commits into
mainfrom
fix/review-remediation

Conversation

@ejwhite7

@ejwhite7 ejwhite7 commented Apr 30, 2026

Copy link
Copy Markdown
Owner

Review Remediation: Adversarial Review Round 2

This PR addresses 12 defects and incomplete items identified during the second adversarial review of the ruby-pi gem.

New Defects Fixed

  1. String interpolation escaped in ProviderError (anthropic.rb:520-525)

    • Backslash-escaped #{} in ProviderError message prevented actual tool name and parser error from appearing
    • Fix: Removed backslash escaping so interpolation works correctly
  2. Thread-unsafe streaming instance variables (anthropic.rb)

    • @_stream_* instance variables leaked state across concurrent requests
    • Fix: Replaced with method-local variables via process_anthropic_stream_event helper
  3. Per-agent config threading (agent/core.rb, all providers)

    • config: kwarg was decorative -- providers still read RubyPi.configuration directly
    • Fix: BaseProvider now accepts config: kwarg; all providers use passed-in config
  4. Streaming error body recovery (all providers)

    • on_data callback consumed error response bodies, leaving ApiError with empty body
    • Fix: Detect error status in on_data, accumulate error body separately, pass to handle_error_response
  5. Consecutive user messages after compaction (compaction.rb)

    • Compaction summary as role: :user could produce consecutive user messages (rejected by Anthropic)
    • Fix: Changed to role: :assistant

Issues Not Fully Addressed from V1

  1. OpenAI missing tool_call_id (openai.rb)

    • Still fell back to "unknown" instead of failing fast
    • Fix: Raises ProviderError with descriptive message (same as Anthropic)
  2. Gemini streaming finish_reason (gemini.rb)

    • Hardcoded "stop" instead of reading actual finishReason
    • Fix: Parses finishReason from candidate object with "STOP" -> "stop" normalization
  3. README incorrect event keys (README.md)

    • References to e[:iteration] and event[:iterations] (nonexistent keys)
    • Fix: Changed to e[:turn] and event[:result].turns
  4. CHANGELOG update (CHANGELOG.md)

    • Added comprehensive v0.1.4 entry documenting all fixes
  5. Dead parse_sse_events method (base_provider.rb)

    • Removed unused method (all providers now use real incremental streaming)
  6. faraday-net_http version cap (Gemfile, gemspec)

    • Removed arbitrary < 3.4 upper bound
  7. BufferedStreamProxy breaks streaming UX (fallback.rb)

    • Events were buffered until completion even on happy path
    • Fix: Stream events directly to consumer; emit :fallback_start event on primary failure

Test Results

All 444 tests pass with 0 failures.

ejwhite7 and others added 19 commits April 30, 2026 14:44
- Fix Agent constructor: remove stream:, add required system_prompt:
- Fix event payload keys: e[:data] -> e[:content], e[:name] -> e[:tool_name]
- Fix result.output -> result.content, result.iterations -> result.turns
- Remove result.stop_reason (does not exist)
- Fix context_compaction: -> compaction:, context_transform: -> transform_context:
- Fix extensions: [] -> agent.use(ExtensionClass) (takes class, not instance)
- Fix Compaction constructor: add required summary_model:, remove strategy:
- Fix Transform: module with factory methods, not a class
- Remove :before_tool_call/:after_tool_call from events lists (they are hooks)
- Add Tool vs Tools namespace clarification
- Update CI matrix reference in CLAUDE.md
- Remove unused ostruct ~> 0.6 dependency (#24)
- Remove arbitrary faraday-net_http < 3.4 upper bound (#25)
- Update placeholder email to maintainer's noreply address (#26)
- Add Ruby 3.4 to CI matrix and document supported versions (#38)
…g (#27, #33)

- Extract set_defaults method called by both initialize and reset! (#27)
- Support per-agent Configuration instances via config: kwarg on Agent::Core (#33)
- Add effective_config method that falls back to global config
- Add comprehensive configuration specs
- Add mutex protection to all read methods: find, subset, by_category,
  all, names, size, registered? (#29)
- Replace unconditional warn to stderr with logger.debug when a logger
  is configured; silent otherwise (#28)
- Add thread safety spec with concurrent read/write test
- Update overwrite test expectations for new logging behavior
…#37)

- Improve emit comment to accurately explain recursion guard behavior:
  errors in :error handlers are silently swallowed to prevent unbounded
  recursion (#30)
- Add :tool_call_delta to EVENTS constant for streaming tool call data (#37)
- Add specs for tool_call_delta event subscription and emission
…#31, #33, #36)

- Expose @extensions as attr_reader for introspection (#31)
- Accept config: kwarg for per-agent configuration override (#33)
- Add effective_config method falling back to global config
- Plumb execution_mode and tool_timeout through constructor to Loop (#36)
- Add specs for extensions reader, per-agent config, execution options
- Serialize error as { class: error.class.name, message: error.message }
  instead of just error.message, preserving diagnostic context
- Add spec for custom error class preservation
… (#36, #37)

- Accept execution_mode (:parallel/:sequential) and tool_timeout params
  in Loop constructor, passed through from Agent::Core (#36)
- Emit :tool_call_delta event when provider yields tool-call streaming
  data during the think phase (#37)
- Add specs for configurable execution options and tool_call_delta events
- Clarify that the empty text block is needed when assistant turns contain
  only tool_use calls with no accompanying text, to satisfy Anthropic's
  non-empty content constraint
…poisoning

Compaction previously wrote the conversation summary as role: :system. When
Loop#build_llm_messages prepended the real system prompt (also role: :system),
two system messages existed in the array. The Anthropic provider extracts a
single system_message by overwriting on each match (last wins), so the
compaction summary silently replaced the actual system prompt.

Fix: Changed compaction summary from role: :system to role: :user with a
clear [Conversation Summary] prefix. The system prompt now always remains
the authoritative system message.

Tests updated to verify:
- Summary uses role :user, not :system
- No :system messages in compacted output
- Only one :system message when loop prepends the real prompt
The retry loop incremented 'attempt' before the call and used
'attempt < max_retries' as the retry condition. With max_retries: 3,
this gave only 2 retries (3 total attempts) instead of the expected
3 retries (4 total attempts).

Fix: Changed condition to 'attempt <= max_retries' so max_retries: N
means N retries after the initial failure, for N+1 total attempts.

Tests added:
- Verifies exactly 4 requests with max_retries: 3 (via WebMock count)
- Verifies success on the last retry attempt (4th of 4)
All three providers (Anthropic, OpenAI, Gemini) previously did:
  response = conn.post(...)
  parse_sse_events(response.body)
With net_http adapter and no on_data callback, Faraday buffers the entire
response body before returning. No deltas reached the caller until the
model finished generating -- fake streaming.

Fix: Each provider's streaming method now uses req.options.on_data to
process SSE chunks incrementally as they arrive from the API. An
sse_buffer accumulates partial lines across chunks, and complete SSE
events are parsed and yielded to the caller's block immediately.

Tests added for all three providers verifying incremental event delivery.
Fallback < BaseProvider inherited the retry wrapper from BaseProvider#complete.
Inside perform_complete, it called @primary.complete (which retries internally)
then @fallback.complete (also retries). The retry layers composed
multiplicatively: outer_retries x (primary_retries + fallback_retries).

Fix: Override #complete in Fallback to call perform_complete directly,
skipping the outer retry wrapper. Each inner provider handles its own
retries independently.

Tests added:
- Verifies exact request counts (4 primary + 4 fallback, not multiplied)
- Verifies no outer retry loop re-triggers the primary+fallback cycle
inject_datetime, inject_user_preferences, and inject_workspace_context
all used state.system_prompt += to append content. Since transforms run
before every LLM call in the loop, across N turns the system prompt
accumulated N timestamps, N preference blocks, and N workspace contexts
until it blew the context window.

Fix: Each injection type now uses unique HTML comment markers. Before
re-adding, the transform strips any existing injection matching its
markers. This makes all injections idempotent.

Tests added:
- Each inject method called 5x produces exactly 1 injection
- Base prompt preserved after multiple calls
- Composed transform called 10x produces exactly 1 of each injection
- nil preferences correctly strip previous injection
Issues fixed:
- #9: Replace unsafe Timeout.timeout with thread+join for sequential mode
- #10: Fix Future#value(timeout) nil misinterpretation using wait+fulfilled?
- #11: Cancel/interrupt leaked future threads after timeout
- #12: Guard JSON.parse against empty-string arguments in OpenAI/Anthropic
- #13: Move Gemini API key from URL query string to x-goog-api-key header
- #14: Rename NotImplementedError to AbstractMethodError to avoid shadowing stdlib
- #15: Guard ToolCall#parse_arguments against non-string/non-hash inputs
- #16: Add State#reset_iteration! method; reset counter in run() and continue()
- #17: Guard against nil tools registry; raise NoToolsRegisteredError
- #18: Re-raise programming errors (NoMethodError, NameError, etc.) in Loop
- #19: Set stop_reason and error on max_iterations; success? returns false
- #20: Remove dead faraday-retry dependency; fix build_connection docstring
- #21: Add stream_options for OpenAI streaming usage data
- #22: Guard Anthropic streaming JSON.parse with rescue JSON::ParserError
- #23: Buffer streaming deltas in Fallback to prevent double-emit

All fixes include comprehensive tests. Full test suite passes (398 examples, 0 failures).
…remediation

Consolidates all three feature branches:
- fix/critical-correctness (Issues #2-#8)
- fix/major-reliability (Issues #9-#23)
- fix/docs-and-hygiene (Issues #24-#38)

Conflicts resolved by integrating both sets of changes:
- anthropic.rb: kept real streaming (critical) + tool_use_id validation & JSON guard (major)
- openai.rb: kept real streaming (critical) + Issue #21 streaming usage capture (major)
- fallback.rb: kept both comment blocks (no-extra-retry + streaming buffer docs)
- result.rb: kept richer error hash (docs) + stop_reason/truncated fields (critical)
- Gemfile/gemspec: kept stricter version pins from critical-correctness
- Fixed 3 Gemini test stubs using old ?key= query parameter URL pattern
  (provider now sends API key via X-Goog-Api-Key header)
- Synced Gemfile.lock with gemspec (removed phantom ostruct dependency)
@ejwhite7 ejwhite7 force-pushed the fix/review-remediation branch from 5da5e90 to d8ddc1c Compare April 30, 2026 15:25
@ejwhite7 ejwhite7 merged commit 99f55fa into main Apr 30, 2026
3 checks passed
@ejwhite7 ejwhite7 deleted the fix/review-remediation branch April 30, 2026 15:27
@ejwhite7

Copy link
Copy Markdown
Owner Author

This PR has been superseded by #8 which addresses 12 additional defects from adversarial review round 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant