Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,29 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.8] - 2026-06-09

### Fixed (adversarial review round 6)

- **`Retry-After` header was parsed but never honored (High)**: On a 429, `handle_error_response` stored the server's `Retry-After` on `RateLimitError#retry_after`, but the retry loop in `BaseProvider#complete` always slept the local exponential backoff (capped at `retry_max_delay`) — hammering a server that asked for a longer cooldown until the retry budget burned out. The retry delay now prefers a positive `retry_after` (capped at `RETRY_AFTER_CEILING`, 60s); HTTP-date values (which parse to `0.0`) and absent headers fall through to the computed backoff
- **Parallel executor timeout/rejection Results reported `name: "unknown"` (High)**: `execute_parallel` hardcoded `"unknown"` in the timeout and rejected-future branches, so with several tools timing out concurrently, logs and `:tool_execution_end` subscribers could not tell which tool hung. Futures are now zipped with their originating calls and failure Results carry the real tool name; the timeout message matches sequential mode (`Tool 'x' timed out after Ns`). The rejected-future branch also no longer reports a misleading full-timeout `duration_ms` for what may have been an instant failure (now `0.0`)
- **Keyword-parameter tool blocks failed on every call (High)**: `Definition#call` passed a single positional Hash, so a block written `{ |content:, platform:| ... }` — the natural style given named schema parameters — raised `ArgumentError: missing keyword` on every invocation (surfacing as a confusing failed Result). `Definition` now detects keyword parameters at construction and splats the arguments hash to keywords; positional-Hash blocks are unchanged. Keyword blocks without `**rest` raise on unexpected keys — strict by design, since the keys come from the LLM
- **`:compaction` event was never emitted in production (Medium)**: `Compaction#emitter` defaults to nil and nothing ever assigned it, so the documented `agent.on(:compaction)` subscription silently never fired — the only place the emitter was set was the spec itself. `Loop#initialize` now wires its emitter into the compaction strategy (an explicitly preassigned emitter is left untouched)
- **Streaming chunks were never normalized to UTF-8 (Medium)**: Faraday delivers `on_data` chunks as ASCII-8BIT; appending a chunk to a UTF-8 SSE buffer already holding non-ASCII text raises `Encoding::CompatibilityError`, and yielded deltas could carry binary encoding into consumers' UTF-8 buffers. All three providers now buffer in BINARY and re-encode each complete SSE line to UTF-8 (with `scrub` guarding invalid bytes) before parsing, so `:text_delta` events are always valid UTF-8 — including multi-byte characters split across network chunks
- **Streaming fallback gave consumers no way to truncate partial primary output (Medium)**: If the primary streamed text and then died mid-stream, the fallback streamed a complete fresh response — a delta-appending consumer rendered `"<partial primary><full fallback>"` with no signal of how much to discard. The `:fallback_start` payload now includes `partial_output` (Boolean) and `partial_chars` (characters already yielded), so consumers can deterministically reset
- **Tool names were not validated against provider constraints (Medium)**: A tool named `send.email` registered fine and then 400'd on every Anthropic request with an opaque server error. `Definition` now validates names against `/\A[a-zA-Z0-9_-]{1,64}\z/` (the strictest provider constraint) and raises `ArgumentError` at definition time with a pointed message
- **`json` was used everywhere but never declared or required (Medium)**: `JSON.parse`/`JSON.generate` are called throughout the providers and agent loop, but the gem relied on Faraday's transitive `json` dependency and the entry point's single `require "json"` — loading `agent/loop.rb` in isolation raised `NameError`, contradicting the composability principle. The gemspec now declares `json >= 2.0` and every file referencing `JSON` requires it directly (pinned by a source-scan spec)
- **Configuration accepted negative retry/timeout values (Low)**: `max_retries = -1` silently disabled retries and a negative delay raised deep inside the retry loop's `sleep`. The numeric settings now have validated writers that raise `ArgumentError` at assignment time
- **Global configuration first-access race (Low)**: `@configuration ||= Configuration.new` was unsynchronized; two threads racing the first call could each construct a Configuration with one silently discarded. The configuration is now eagerly initialized at require time
- **`continue()` Result accounting documented (Docs)**: Each `run`/`continue` builds a fresh Loop, so the returned Result's `usage`/`tool_calls_made`/`turns` cover only that invocation while `messages` is cumulative — an undocumented asymmetry, now documented on `Core#continue`
- **Schema DSL documented as LLM-facing hints, not validation (Docs)**: Nothing validates model-supplied arguments against `tool.parameters` before invoking the block — `required`/`enum`/`minimum` constrain what the model is asked to produce, with no runtime enforcement or type coercion. This is deliberate (anti-framework), but the schema header now says so loudly and directs tool blocks to treat arguments as untrusted input
- **`State#add_message` unbounded growth documented (Docs)**: Long-lived agents calling `continue()` repeatedly accumulate messages linearly without compaction configured; documented on the method
- **CLAUDE.md module map corrected (Docs)**: The map referenced a nonexistent `agent/agent.rb`, omitted `core.rb`/`loop.rb`/`state.rb`/`events.rb`, hardcoded version `0.1.0`, and the extension example used the one-arg `|event|` block signature instead of the actual `|data, agent|`. All corrected

### Release-history note

- **`[0.1.4]` below was never actually released**: `lib/ruby_pi/version.rb` went from `0.1.3` directly to `0.1.5` — the round-2 fixes documented under 0.1.4 shipped without a version bump and were first published as part of 0.1.5. There is intentionally no `v0.1.4` git tag or gem. (Discovered during round 6; the entry is kept for historical accuracy of *what* changed.)

## [0.1.7] - 2026-05-28

### Fixed (adversarial review round 5)
Expand Down
24 changes: 17 additions & 7 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,11 @@ RubyPi::Tools Tool definition and execution
+-- Result Execution outcome (value or error + timing)

RubyPi::Agent Think-act-observe loop
+-- Result Agent run outcome (output, messages, iterations)
|-- Core Public API: run/continue, extensions, event subscription
|-- Loop The think-act-observe cycle itself
|-- State Mutable run state (messages, iteration, hooks)
|-- Events EVENTS list + EventEmitter
+-- Result Agent run outcome (output, messages, turns)

RubyPi::Context Conversation context management
|-- Compaction Truncate/summarize to fit token limits
Expand All @@ -59,7 +63,7 @@ RubyPi::Extensions Hook system for agent events
lib/
ruby_pi.rb # Entry point, requires everything, LLM.model factory
ruby_pi/
version.rb # RubyPi::VERSION = "0.1.0"
version.rb # RubyPi::VERSION (single source of truth; see CHANGELOG.md)
configuration.rb # RubyPi::Configuration (API keys, retries, defaults)
errors.rb # Error hierarchy (ApiError, AuthenticationError, etc.)
llm/
Expand All @@ -79,8 +83,11 @@ lib/
executor.rb # Parallel/sequential Executor
result.rb # Execution Result
agent/
agent.rb # RubyPi::Agent (think-act-observe loop)
result.rb # Agent::Result (output, messages, iterations)
core.rb # Agent::Core — public API (run, continue, use, on)
loop.rb # Agent::Loop — the think-act-observe cycle
state.rb # Agent::State — messages, iteration counter, hooks
events.rb # EVENTS constant + EventEmitter
result.rb # Agent::Result (output, messages, turns)
context/
compaction.rb # Context::Compaction
transform.rb # Context::Transform
Expand All @@ -90,8 +97,10 @@ lib/
spec/
spec_helper.rb # RSpec + WebMock setup
ruby_pi/
agent/ # Unit tests for core, loop, state, events, result
llm/ # Unit tests for each LLM provider
tools/ # Unit tests for tools, registry, executor
fixes/ # Regression specs pinned to review-round fixes
integration/
agent_integration_spec.rb # End-to-end agent loop tests
```
Expand Down Expand Up @@ -188,17 +197,18 @@ The callable receives the `Agent::State` object and can mutate it (typically app
## Adding a New Extension

1. Subclass `RubyPi::Extensions::Base`
2. Use `on_event :event_name do |event| ... end` to subscribe to events
2. Use `on_event :event_name do |data, agent| ... end` to subscribe to events
(handlers receive the event payload hash and the agent instance)
3. Available events: `:turn_start`, `:turn_end`, `:text_delta`, `:tool_call_delta`, `:tool_execution_start`, `:tool_execution_end`, `:agent_end`, `:error`, `:compaction`, `:provider_fallback`
Note: `:before_tool_call` and `:after_tool_call` are constructor hooks (Procs on Agent::Core), not subscribable events.
4. Register the extension: `agent.use(MyExtension)` (pass the CLASS, not an instance)

```ruby
class MyExtension < RubyPi::Extensions::Base
on_event :agent_end do |event|
on_event :agent_end do |data, agent|
# The :agent_end payload is { result:, success: }; the iteration count
# lives on the Result object as `turns`.
puts "Agent finished after #{event[:result].turns} turns"
puts "Agent finished after #{data[:result].turns} turns"
end
end
```
Expand Down
3 changes: 2 additions & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
PATH
remote: .
specs:
ruby-pi (0.1.7)
ruby-pi (0.1.8)
concurrent-ruby (~> 1.2)
faraday (~> 2.0)
faraday-net_http (~> 3.0)
json (>= 2.0)

GEM
remote: https://rubygems.org/
Expand Down
7 changes: 7 additions & 0 deletions lib/ruby_pi.rb
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,13 @@ def reset_configuration!
end
end

# Eagerly initialize the global configuration at load time. The lazy
# `@configuration ||= ...` in .configuration is not synchronized; two
# threads hitting it concurrently on first access could each construct a
# Configuration, with one silently discarded. Initializing here (requires
# run single-threaded) removes the race without adding a mutex to every read.
@configuration = Configuration.new

# Namespace for large language model providers and related abstractions.
module LLM
class << self
Expand Down
6 changes: 6 additions & 0 deletions lib/ruby_pi/agent/core.rb
Original file line number Diff line number Diff line change
Expand Up @@ -140,12 +140,18 @@ def run(prompt)
# the existing conversation history and appends the new prompt before
# resuming the loop.
#
# NOTE on Result accounting: each run/continue builds a fresh Loop, so
# the returned Result's `usage`, `tool_calls_made`, and `turns` cover
# ONLY this invocation — while `messages` is cumulative across the whole
# conversation. Sum the per-call Results if you need session totals.
#
# Issue #16: Uses the encapsulated reset_iteration! method instead of
# the old approach that bypassed encapsulation
# and was fragile.
#
# @param prompt [String] the follow-up user message
# @return [RubyPi::Agent::Result] the outcome of the continued run
# (usage/tool_calls_made/turns are per-invocation; messages cumulative)
def continue(prompt)
@state.reset_iteration!
@state.add_message(role: :user, content: prompt)
Expand Down
10 changes: 10 additions & 0 deletions lib/ruby_pi/agent/loop.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
# is reached. It handles streaming, lifecycle events, compaction, and all
# pre/post tool call hooks.

require "json"

module RubyPi
module Agent
# Executes the think-act-observe cycle against a given State, emitting
Expand Down Expand Up @@ -55,6 +57,14 @@ def initialize(state:, emitter:, compaction: nil, execution_mode: :parallel, too
@state = state
@emitter = emitter
@compaction = compaction
# Wire the loop's emitter into the compaction strategy so the
# documented :compaction event actually reaches agent subscribers.
# Compaction#emitter defaults to nil and nothing else ever sets it —
# without this, `agent.on(:compaction)` never fires. An emitter that
# was already assigned explicitly is left untouched.
if @compaction.respond_to?(:emitter=) && @compaction.respond_to?(:emitter) && @compaction.emitter.nil?
@compaction.emitter = emitter
end
@execution_mode = execution_mode
@tool_timeout = tool_timeout
@tool_calls_made = []
Expand Down
6 changes: 6 additions & 0 deletions lib/ruby_pi/agent/state.rb
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,12 @@ def initialize(

# Appends a message to the conversation history.
#
# NOTE: history grows without bound — there is no built-in cap. Growth
# per run is limited by max_iterations, but long-lived agents that call
# continue() repeatedly (or use a high max_iterations with large tool
# outputs) accumulate messages linearly. Configure
# Agent.new(compaction: ...) to keep the context bounded.
#
# @param role [Symbol, String] the message role (:user, :assistant, :system, :tool)
# @param content [String, nil] the text content of the message
# @param options [Hash] additional fields (e.g., :tool_call_id, :tool_calls)
Expand Down
55 changes: 50 additions & 5 deletions lib/ruby_pi/configuration.rb
Original file line number Diff line number Diff line change
Expand Up @@ -37,19 +37,53 @@ class Configuration
attr_accessor :openai_api_key

# @return [Integer] Maximum number of retry attempts for transient errors (default: 3)
attr_accessor :max_retries
attr_reader :max_retries

# @return [Float] Base delay in seconds for exponential backoff (default: 1.0)
attr_accessor :retry_base_delay
attr_reader :retry_base_delay

# @return [Float] Maximum delay in seconds between retries (default: 30.0)
attr_accessor :retry_max_delay
attr_reader :retry_max_delay

# @return [Integer] HTTP request timeout in seconds (default: 120)
attr_accessor :request_timeout
attr_reader :request_timeout

# @return [Integer] HTTP connection open timeout in seconds (default: 10)
attr_accessor :open_timeout
attr_reader :open_timeout

# Validated writers for numeric settings. A negative max_retries silently
# disables retries and a negative delay raises deep inside the retry
# loop's sleep — fail fast at assignment time instead, where the typo is.

# @param value [Integer] must be a non-negative integer
def max_retries=(value)
validate_numeric!(:max_retries, value)
@max_retries = value
end

# @param value [Numeric] must be non-negative
def retry_base_delay=(value)
validate_numeric!(:retry_base_delay, value)
@retry_base_delay = value
end

# @param value [Numeric] must be non-negative
def retry_max_delay=(value)
validate_numeric!(:retry_max_delay, value)
@retry_max_delay = value
end

# @param value [Numeric] must be non-negative
def request_timeout=(value)
validate_numeric!(:request_timeout, value)
@request_timeout = value
end

# @param value [Numeric] must be non-negative
def open_timeout=(value)
validate_numeric!(:open_timeout, value)
@open_timeout = value
end

# @return [String] Default model name for Gemini provider
attr_accessor :default_gemini_model
Expand Down Expand Up @@ -78,6 +112,17 @@ def reset!

private

# Raises unless the value is a non-negative Numeric.
#
# @param name [Symbol] the setting name (for the error message)
# @param value [Object] the value being assigned
# @raise [ArgumentError] if value is not a Numeric or is negative
def validate_numeric!(name, value)
return if value.is_a?(Numeric) && value >= 0

raise ArgumentError, "#{name} must be a non-negative number, got #{value.inspect}"
end

# Sets all configuration ivars to their default values. Called by both
# initialize and reset! to ensure consistent defaults without the
# anti-pattern of calling initialize from reset!.
Expand Down
28 changes: 20 additions & 8 deletions lib/ruby_pi/llm/anthropic.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
# the Anthropic Messages API for both synchronous and streaming completions,
# including tool_use block support.

require "json"

module RubyPi
module LLM
# Anthropic Claude provider implementation. Communicates with the Anthropic
Expand Down Expand Up @@ -370,12 +372,19 @@ def perform_streaming_request(body, &block)
# process complete lines incrementally so that deltas reach the caller
# as soon as each SSE event is fully received — not after the entire
# response has been buffered.
sse_buffer = +""
#
# The buffer is BINARY because chunks arrive as ASCII-8BIT and may end
# mid-way through a multi-byte UTF-8 character; appending such a chunk
# to a UTF-8 buffer that already holds non-ASCII text raises
# Encoding::CompatibilityError. Each complete line is re-encoded to
# UTF-8 (and scrubbed) before parsing, so deltas reach the caller as
# valid UTF-8 strings.
sse_buffer = (+"").force_encoding(Encoding::BINARY)
response_status = nil

# Accumulate error response body separately so ApiError gets the
# full body even though on_data consumed the chunks.
error_body = +""
error_body = (+"").force_encoding(Encoding::BINARY)

response = with_transport_errors do
conn.post("/v1/messages") do |req|
Expand All @@ -394,14 +403,17 @@ def perform_streaming_request(body, &block)
# calls on_data for error responses too, which would otherwise
# consume the body and leave response.body empty.
if response_status && response_status >= 400
error_body << chunk
error_body << chunk.b
next
end

sse_buffer << chunk
# Process all complete lines in the buffer
sse_buffer << chunk.b
# Process all complete lines in the buffer. A complete line holds
# complete UTF-8 sequences (multi-byte characters split across
# chunks are repaired by the buffering), so re-encode it to UTF-8
# here; scrub guards against a server sending invalid bytes.
while (line_end = sse_buffer.index("\n"))
line = sse_buffer.slice!(0, line_end + 1).strip
line = sse_buffer.slice!(0, line_end + 1).force_encoding(Encoding::UTF_8).scrub.strip
next if line.empty?
next unless line.start_with?("data: ")

Expand Down Expand Up @@ -436,12 +448,12 @@ def perform_streaming_request(body, &block)
unless response.success?
# Reconstruct the response body from what on_data accumulated
error_response = response
error_body_str = error_body.empty? ? response.body : error_body
error_body_str = error_body.empty? ? response.body : error_body.force_encoding(Encoding::UTF_8).scrub
handle_error_response(error_response, override_body: error_body_str)
end

# Process any remaining data in the buffer after the connection closes
sse_buffer.each_line do |line|
sse_buffer.force_encoding(Encoding::UTF_8).scrub.each_line do |line|
line = line.strip
next if line.empty?
next unless line.start_with?("data: ")
Expand Down
Loading
Loading