fix: resolve all 35 issues from adversarial code review#7
Merged
Conversation
- Fix Agent constructor: remove stream:, add required system_prompt: - Fix event payload keys: e[:data] -> e[:content], e[:name] -> e[:tool_name] - Fix result.output -> result.content, result.iterations -> result.turns - Remove result.stop_reason (does not exist) - Fix context_compaction: -> compaction:, context_transform: -> transform_context: - Fix extensions: [] -> agent.use(ExtensionClass) (takes class, not instance) - Fix Compaction constructor: add required summary_model:, remove strategy: - Fix Transform: module with factory methods, not a class - Remove :before_tool_call/:after_tool_call from events lists (they are hooks) - Add Tool vs Tools namespace clarification - Update CI matrix reference in CLAUDE.md
- Remove unused ostruct ~> 0.6 dependency (#24) - Remove arbitrary faraday-net_http < 3.4 upper bound (#25) - Update placeholder email to maintainer's noreply address (#26) - Add Ruby 3.4 to CI matrix and document supported versions (#38)
…g (#27, #33) - Extract set_defaults method called by both initialize and reset! (#27) - Support per-agent Configuration instances via config: kwarg on Agent::Core (#33) - Add effective_config method that falls back to global config - Add comprehensive configuration specs
- Add mutex protection to all read methods: find, subset, by_category, all, names, size, registered? (#29) - Replace unconditional warn to stderr with logger.debug when a logger is configured; silent otherwise (#28) - Add thread safety spec with concurrent read/write test - Update overwrite test expectations for new logging behavior
…#37) - Improve emit comment to accurately explain recursion guard behavior: errors in :error handlers are silently swallowed to prevent unbounded recursion (#30) - Add :tool_call_delta to EVENTS constant for streaming tool call data (#37) - Add specs for tool_call_delta event subscription and emission
…#31, #33, #36) - Expose @extensions as attr_reader for introspection (#31) - Accept config: kwarg for per-agent configuration override (#33) - Add effective_config method falling back to global config - Plumb execution_mode and tool_timeout through constructor to Loop (#36) - Add specs for extensions reader, per-agent config, execution options
- Serialize error as { class: error.class.name, message: error.message }
instead of just error.message, preserving diagnostic context
- Add spec for custom error class preservation
… (#36, #37) - Accept execution_mode (:parallel/:sequential) and tool_timeout params in Loop constructor, passed through from Agent::Core (#36) - Emit :tool_call_delta event when provider yields tool-call streaming data during the think phase (#37) - Add specs for configurable execution options and tool_call_delta events
- Clarify that the empty text block is needed when assistant turns contain only tool_use calls with no accompanying text, to satisfy Anthropic's non-empty content constraint
…poisoning Compaction previously wrote the conversation summary as role: :system. When Loop#build_llm_messages prepended the real system prompt (also role: :system), two system messages existed in the array. The Anthropic provider extracts a single system_message by overwriting on each match (last wins), so the compaction summary silently replaced the actual system prompt. Fix: Changed compaction summary from role: :system to role: :user with a clear [Conversation Summary] prefix. The system prompt now always remains the authoritative system message. Tests updated to verify: - Summary uses role :user, not :system - No :system messages in compacted output - Only one :system message when loop prepends the real prompt
The retry loop incremented 'attempt' before the call and used 'attempt < max_retries' as the retry condition. With max_retries: 3, this gave only 2 retries (3 total attempts) instead of the expected 3 retries (4 total attempts). Fix: Changed condition to 'attempt <= max_retries' so max_retries: N means N retries after the initial failure, for N+1 total attempts. Tests added: - Verifies exactly 4 requests with max_retries: 3 (via WebMock count) - Verifies success on the last retry attempt (4th of 4)
All three providers (Anthropic, OpenAI, Gemini) previously did: response = conn.post(...) parse_sse_events(response.body) With net_http adapter and no on_data callback, Faraday buffers the entire response body before returning. No deltas reached the caller until the model finished generating -- fake streaming. Fix: Each provider's streaming method now uses req.options.on_data to process SSE chunks incrementally as they arrive from the API. An sse_buffer accumulates partial lines across chunks, and complete SSE events are parsed and yielded to the caller's block immediately. Tests added for all three providers verifying incremental event delivery.
Fallback < BaseProvider inherited the retry wrapper from BaseProvider#complete. Inside perform_complete, it called @primary.complete (which retries internally) then @fallback.complete (also retries). The retry layers composed multiplicatively: outer_retries x (primary_retries + fallback_retries). Fix: Override #complete in Fallback to call perform_complete directly, skipping the outer retry wrapper. Each inner provider handles its own retries independently. Tests added: - Verifies exact request counts (4 primary + 4 fallback, not multiplied) - Verifies no outer retry loop re-triggers the primary+fallback cycle
inject_datetime, inject_user_preferences, and inject_workspace_context all used state.system_prompt += to append content. Since transforms run before every LLM call in the loop, across N turns the system prompt accumulated N timestamps, N preference blocks, and N workspace contexts until it blew the context window. Fix: Each injection type now uses unique HTML comment markers. Before re-adding, the transform strips any existing injection matching its markers. This makes all injections idempotent. Tests added: - Each inject method called 5x produces exactly 1 injection - Base prompt preserved after multiple calls - Composed transform called 10x produces exactly 1 of each injection - nil preferences correctly strip previous injection
Issues fixed: - #9: Replace unsafe Timeout.timeout with thread+join for sequential mode - #10: Fix Future#value(timeout) nil misinterpretation using wait+fulfilled? - #11: Cancel/interrupt leaked future threads after timeout - #12: Guard JSON.parse against empty-string arguments in OpenAI/Anthropic - #13: Move Gemini API key from URL query string to x-goog-api-key header - #14: Rename NotImplementedError to AbstractMethodError to avoid shadowing stdlib - #15: Guard ToolCall#parse_arguments against non-string/non-hash inputs - #16: Add State#reset_iteration! method; reset counter in run() and continue() - #17: Guard against nil tools registry; raise NoToolsRegisteredError - #18: Re-raise programming errors (NoMethodError, NameError, etc.) in Loop - #19: Set stop_reason and error on max_iterations; success? returns false - #20: Remove dead faraday-retry dependency; fix build_connection docstring - #21: Add stream_options for OpenAI streaming usage data - #22: Guard Anthropic streaming JSON.parse with rescue JSON::ParserError - #23: Buffer streaming deltas in Fallback to prevent double-emit All fixes include comprehensive tests. Full test suite passes (398 examples, 0 failures).
…remediation Consolidates all three feature branches: - fix/critical-correctness (Issues #2-#8) - fix/major-reliability (Issues #9-#23) - fix/docs-and-hygiene (Issues #24-#38) Conflicts resolved by integrating both sets of changes: - anthropic.rb: kept real streaming (critical) + tool_use_id validation & JSON guard (major) - openai.rb: kept real streaming (critical) + Issue #21 streaming usage capture (major) - fallback.rb: kept both comment blocks (no-extra-retry + streaming buffer docs) - result.rb: kept richer error hash (docs) + stop_reason/truncated fields (critical) - Gemfile/gemspec: kept stricter version pins from critical-correctness
- Fixed 3 Gemini test stubs using old ?key= query parameter URL pattern (provider now sends API key via X-Goog-Api-Key header) - Synced Gemfile.lock with gemspec (removed phantom ostruct dependency)
5da5e90 to
d8ddc1c
Compare
Owner
Author
|
This PR has been superseded by #8 which addresses 12 additional defects from adversarial review round 2. |
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Review Remediation: Adversarial Review Round 2
This PR addresses 12 defects and incomplete items identified during the second adversarial review of the ruby-pi gem.
New Defects Fixed
String interpolation escaped in ProviderError (
anthropic.rb:520-525)#{}in ProviderError message prevented actual tool name and parser error from appearingThread-unsafe streaming instance variables (
anthropic.rb)@_stream_*instance variables leaked state across concurrent requestsprocess_anthropic_stream_eventhelperPer-agent config threading (
agent/core.rb, all providers)config:kwarg was decorative -- providers still readRubyPi.configurationdirectlyBaseProvidernow acceptsconfig:kwarg; all providers use passed-in configStreaming error body recovery (all providers)
on_datacallback consumed error response bodies, leavingApiErrorwith empty bodyon_data, accumulate error body separately, pass tohandle_error_responseConsecutive user messages after compaction (
compaction.rb)role: :usercould produce consecutive user messages (rejected by Anthropic)role: :assistantIssues Not Fully Addressed from V1
OpenAI missing tool_call_id (
openai.rb)"unknown"instead of failing fastProviderErrorwith descriptive message (same as Anthropic)Gemini streaming finish_reason (
gemini.rb)"stop"instead of reading actualfinishReasonfinishReasonfrom candidate object with"STOP"->"stop"normalizationREADME incorrect event keys (
README.md)e[:iteration]andevent[:iterations](nonexistent keys)e[:turn]andevent[:result].turnsCHANGELOG update (
CHANGELOG.md)Dead
parse_sse_eventsmethod (base_provider.rb)faraday-net_httpversion cap (Gemfile,gemspec)< 3.4upper boundBufferedStreamProxy breaks streaming UX (
fallback.rb):fallback_startevent on primary failureTest Results
All 444 tests pass with 0 failures.