From c50f8d1956e612a0a8319b2612d7c966d21b0386 Mon Sep 17 00:00:00 2001
From: EJ White <ejwhite@Mac.lan>
Date: Tue, 9 Jun 2026 14:26:09 -0400
Subject: [PATCH] =?UTF-8?q?fix:=20adversarial=20review=20round=206=20?=
 =?UTF-8?q?=E2=80=94=20release=20v0.1.8?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes all 13 findings from review round 6:

High:
- Honor server Retry-After on 429 (capped at 60s) instead of always
  using local exponential backoff (base_provider.rb)
- Parallel executor timeout/rejection Results carry the real tool name
  instead of "unknown"; rejected futures no longer report a misleading
  full-timeout duration (executor.rb)
- Tool blocks with keyword parameters now work — arguments are splatted
  to keywords when the block declares them (definition.rb)

Medium:
- Wire the Loop's emitter into Compaction so the documented :compaction
  event actually fires (loop.rb)
- Buffer SSE chunks as BINARY and re-encode complete lines to UTF-8 in
  all three providers; streaming deltas are now always valid UTF-8
- :fallback_start payload includes partial_output/partial_chars so
  consumers can truncate partial primary text (fallback.rb)
- Validate tool names against provider constraints at definition time
- Declare json as a runtime dependency and require it in every file
  that uses it (composability; pinned by a source-scan spec)

Low/docs:
- Configuration rejects negative numeric settings; eager-init the
  global configuration to remove the first-access race
- Document continue() per-invocation Result accounting, schema-as-hints
  (not validation), and unbounded message growth without compaction
- Correct CLAUDE.md module map (agent/agent.rb never existed), version
  reference, and extension handler signature
- CHANGELOG note: 0.1.4 was never actually released (version.rb went
  0.1.3 → 0.1.5); created the missing v0.1.7 tag locally

22 new regression specs in spec/ruby_pi/fixes/review_round6_spec.rb.
Suite: 492 examples, 0 failures.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 CHANGELOG.md                             |  23 ++
 CLAUDE.md                                |  24 +-
 Gemfile.lock                             |   3 +-
 lib/ruby_pi.rb                           |   7 +
 lib/ruby_pi/agent/core.rb                |   6 +
 lib/ruby_pi/agent/loop.rb                |  10 +
 lib/ruby_pi/agent/state.rb               |   6 +
 lib/ruby_pi/configuration.rb             |  55 ++++-
 lib/ruby_pi/llm/anthropic.rb             |  28 ++-
 lib/ruby_pi/llm/base_provider.rb         |  25 +-
 lib/ruby_pi/llm/fallback.rb              |  26 +-
 lib/ruby_pi/llm/gemini.rb                |  23 +-
 lib/ruby_pi/llm/openai.rb                |  24 +-
 lib/ruby_pi/llm/tool_call.rb             |   2 +
 lib/ruby_pi/tools/definition.rb          |  43 +++-
 lib/ruby_pi/tools/executor.rb            |  20 +-
 lib/ruby_pi/tools/schema.rb              |  10 +
 lib/ruby_pi/version.rb                   |   2 +-
 ruby-pi.gemspec                          |   4 +
 spec/ruby_pi/fixes/review_round6_spec.rb | 288 +++++++++++++++++++++++
 20 files changed, 578 insertions(+), 51 deletions(-)
 create mode 100644 spec/ruby_pi/fixes/review_round6_spec.rb

diff --git a/CHANGELOG.md b/CHANGELOG.md
index d9e9e05..f706f28 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,29 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.1.8] - 2026-06-09
+
+### Fixed (adversarial review round 6)
+
+- **`Retry-After` header was parsed but never honored (High)**: On a 429, `handle_error_response` stored the server's `Retry-After` on `RateLimitError#retry_after`, but the retry loop in `BaseProvider#complete` always slept the local exponential backoff (capped at `retry_max_delay`) — hammering a server that asked for a longer cooldown until the retry budget burned out. The retry delay now prefers a positive `retry_after` (capped at `RETRY_AFTER_CEILING`, 60s); HTTP-date values (which parse to `0.0`) and absent headers fall through to the computed backoff
+- **Parallel executor timeout/rejection Results reported `name: "unknown"` (High)**: `execute_parallel` hardcoded `"unknown"` in the timeout and rejected-future branches, so with several tools timing out concurrently, logs and `:tool_execution_end` subscribers could not tell which tool hung. Futures are now zipped with their originating calls and failure Results carry the real tool name; the timeout message matches sequential mode (`Tool 'x' timed out after Ns`). The rejected-future branch also no longer reports a misleading full-timeout `duration_ms` for what may have been an instant failure (now `0.0`)
+- **Keyword-parameter tool blocks failed on every call (High)**: `Definition#call` passed a single positional Hash, so a block written `{ |content:, platform:| ... }` — the natural style given named schema parameters — raised `ArgumentError: missing keyword` on every invocation (surfacing as a confusing failed Result). `Definition` now detects keyword parameters at construction and splats the arguments hash to keywords; positional-Hash blocks are unchanged. Keyword blocks without `**rest` raise on unexpected keys — strict by design, since the keys come from the LLM
+- **`:compaction` event was never emitted in production (Medium)**: `Compaction#emitter` defaults to nil and nothing ever assigned it, so the documented `agent.on(:compaction)` subscription silently never fired — the only place the emitter was set was the spec itself. `Loop#initialize` now wires its emitter into the compaction strategy (an explicitly preassigned emitter is left untouched)
+- **Streaming chunks were never normalized to UTF-8 (Medium)**: Faraday delivers `on_data` chunks as ASCII-8BIT; appending a chunk to a UTF-8 SSE buffer already holding non-ASCII text raises `Encoding::CompatibilityError`, and yielded deltas could carry binary encoding into consumers' UTF-8 buffers. All three providers now buffer in BINARY and re-encode each complete SSE line to UTF-8 (with `scrub` guarding invalid bytes) before parsing, so `:text_delta` events are always valid UTF-8 — including multi-byte characters split across network chunks
+- **Streaming fallback gave consumers no way to truncate partial primary output (Medium)**: If the primary streamed text and then died mid-stream, the fallback streamed a complete fresh response — a delta-appending consumer rendered `"<partial primary><full fallback>"` with no signal of how much to discard. The `:fallback_start` payload now includes `partial_output` (Boolean) and `partial_chars` (characters already yielded), so consumers can deterministically reset
+- **Tool names were not validated against provider constraints (Medium)**: A tool named `send.email` registered fine and then 400'd on every Anthropic request with an opaque server error. `Definition` now validates names against `/\A[a-zA-Z0-9_-]{1,64}\z/` (the strictest provider constraint) and raises `ArgumentError` at definition time with a pointed message
+- **`json` was used everywhere but never declared or required (Medium)**: `JSON.parse`/`JSON.generate` are called throughout the providers and agent loop, but the gem relied on Faraday's transitive `json` dependency and the entry point's single `require "json"` — loading `agent/loop.rb` in isolation raised `NameError`, contradicting the composability principle. The gemspec now declares `json >= 2.0` and every file referencing `JSON` requires it directly (pinned by a source-scan spec)
+- **Configuration accepted negative retry/timeout values (Low)**: `max_retries = -1` silently disabled retries and a negative delay raised deep inside the retry loop's `sleep`. The numeric settings now have validated writers that raise `ArgumentError` at assignment time
+- **Global configuration first-access race (Low)**: `@configuration ||= Configuration.new` was unsynchronized; two threads racing the first call could each construct a Configuration with one silently discarded. The configuration is now eagerly initialized at require time
+- **`continue()` Result accounting documented (Docs)**: Each `run`/`continue` builds a fresh Loop, so the returned Result's `usage`/`tool_calls_made`/`turns` cover only that invocation while `messages` is cumulative — an undocumented asymmetry, now documented on `Core#continue`
+- **Schema DSL documented as LLM-facing hints, not validation (Docs)**: Nothing validates model-supplied arguments against `tool.parameters` before invoking the block — `required`/`enum`/`minimum` constrain what the model is asked to produce, with no runtime enforcement or type coercion. This is deliberate (anti-framework), but the schema header now says so loudly and directs tool blocks to treat arguments as untrusted input
+- **`State#add_message` unbounded growth documented (Docs)**: Long-lived agents calling `continue()` repeatedly accumulate messages linearly without compaction configured; documented on the method
+- **CLAUDE.md module map corrected (Docs)**: The map referenced a nonexistent `agent/agent.rb`, omitted `core.rb`/`loop.rb`/`state.rb`/`events.rb`, hardcoded version `0.1.0`, and the extension example used the one-arg `|event|` block signature instead of the actual `|data, agent|`. All corrected
+
+### Release-history note
+
+- **`[0.1.4]` below was never actually released**: `lib/ruby_pi/version.rb` went from `0.1.3` directly to `0.1.5` — the round-2 fixes documented under 0.1.4 shipped without a version bump and were first published as part of 0.1.5. There is intentionally no `v0.1.4` git tag or gem. (Discovered during round 6; the entry is kept for historical accuracy of *what* changed.)
+
 ## [0.1.7] - 2026-05-28
 
 ### Fixed (adversarial review round 5)
diff --git a/CLAUDE.md b/CLAUDE.md
index b228956..44bb354 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -41,7 +41,11 @@ RubyPi::Tools             Tool definition and execution
   +-- Result              Execution outcome (value or error + timing)
 
 RubyPi::Agent             Think-act-observe loop
-  +-- Result              Agent run outcome (output, messages, iterations)
+  |-- Core                Public API: run/continue, extensions, event subscription
+  |-- Loop                The think-act-observe cycle itself
+  |-- State               Mutable run state (messages, iteration, hooks)
+  |-- Events              EVENTS list + EventEmitter
+  +-- Result              Agent run outcome (output, messages, turns)
 
 RubyPi::Context           Conversation context management
   |-- Compaction          Truncate/summarize to fit token limits
@@ -59,7 +63,7 @@ RubyPi::Extensions        Hook system for agent events
 lib/
   ruby_pi.rb                          # Entry point, requires everything, LLM.model factory
   ruby_pi/
-    version.rb                        # RubyPi::VERSION = "0.1.0"
+    version.rb                        # RubyPi::VERSION (single source of truth; see CHANGELOG.md)
     configuration.rb                  # RubyPi::Configuration (API keys, retries, defaults)
     errors.rb                         # Error hierarchy (ApiError, AuthenticationError, etc.)
     llm/
@@ -79,8 +83,11 @@ lib/
       executor.rb                     # Parallel/sequential Executor
       result.rb                       # Execution Result
     agent/
-      agent.rb                        # RubyPi::Agent (think-act-observe loop)
-      result.rb                       # Agent::Result (output, messages, iterations)
+      core.rb                         # Agent::Core — public API (run, continue, use, on)
+      loop.rb                         # Agent::Loop — the think-act-observe cycle
+      state.rb                        # Agent::State — messages, iteration counter, hooks
+      events.rb                       # EVENTS constant + EventEmitter
+      result.rb                       # Agent::Result (output, messages, turns)
     context/
       compaction.rb                   # Context::Compaction
       transform.rb                    # Context::Transform
@@ -90,8 +97,10 @@ lib/
 spec/
   spec_helper.rb                      # RSpec + WebMock setup
   ruby_pi/
+    agent/                            # Unit tests for core, loop, state, events, result
     llm/                              # Unit tests for each LLM provider
     tools/                            # Unit tests for tools, registry, executor
+    fixes/                            # Regression specs pinned to review-round fixes
   integration/
     agent_integration_spec.rb         # End-to-end agent loop tests
 ```
@@ -188,17 +197,18 @@ The callable receives the `Agent::State` object and can mutate it (typically app
 ## Adding a New Extension
 
 1. Subclass `RubyPi::Extensions::Base`
-2. Use `on_event :event_name do |event| ... end` to subscribe to events
+2. Use `on_event :event_name do |data, agent| ... end` to subscribe to events
+   (handlers receive the event payload hash and the agent instance)
 3. Available events: `:turn_start`, `:turn_end`, `:text_delta`, `:tool_call_delta`, `:tool_execution_start`, `:tool_execution_end`, `:agent_end`, `:error`, `:compaction`, `:provider_fallback`
    Note: `:before_tool_call` and `:after_tool_call` are constructor hooks (Procs on Agent::Core), not subscribable events.
 4. Register the extension: `agent.use(MyExtension)` (pass the CLASS, not an instance)
 
 ```ruby
 class MyExtension < RubyPi::Extensions::Base
-  on_event :agent_end do |event|
+  on_event :agent_end do |data, agent|
     # The :agent_end payload is { result:, success: }; the iteration count
     # lives on the Result object as `turns`.
-    puts "Agent finished after #{event[:result].turns} turns"
+    puts "Agent finished after #{data[:result].turns} turns"
   end
 end
 ```
diff --git a/Gemfile.lock b/Gemfile.lock
index bb6e447..4d27ce2 100644
--- a/Gemfile.lock
+++ b/Gemfile.lock
@@ -1,10 +1,11 @@
 PATH
   remote: .
   specs:
-    ruby-pi (0.1.7)
+    ruby-pi (0.1.8)
       concurrent-ruby (~> 1.2)
       faraday (~> 2.0)
       faraday-net_http (~> 3.0)
+      json (>= 2.0)
 
 GEM
   remote: https://rubygems.org/
diff --git a/lib/ruby_pi.rb b/lib/ruby_pi.rb
index 0619a81..7f38367 100644
--- a/lib/ruby_pi.rb
+++ b/lib/ruby_pi.rb
@@ -82,6 +82,13 @@ def reset_configuration!
     end
   end
 
+  # Eagerly initialize the global configuration at load time. The lazy
+  # `@configuration ||= ...` in .configuration is not synchronized; two
+  # threads hitting it concurrently on first access could each construct a
+  # Configuration, with one silently discarded. Initializing here (requires
+  # run single-threaded) removes the race without adding a mutex to every read.
+  @configuration = Configuration.new
+
   # Namespace for large language model providers and related abstractions.
   module LLM
     class << self
diff --git a/lib/ruby_pi/agent/core.rb b/lib/ruby_pi/agent/core.rb
index 6e61ce5..30c647a 100644
--- a/lib/ruby_pi/agent/core.rb
+++ b/lib/ruby_pi/agent/core.rb
@@ -140,12 +140,18 @@ def run(prompt)
       # the existing conversation history and appends the new prompt before
       # resuming the loop.
       #
+      # NOTE on Result accounting: each run/continue builds a fresh Loop, so
+      # the returned Result's `usage`, `tool_calls_made`, and `turns` cover
+      # ONLY this invocation — while `messages` is cumulative across the whole
+      # conversation. Sum the per-call Results if you need session totals.
+      #
       # Issue #16: Uses the encapsulated reset_iteration! method instead of
       # the old approach that bypassed encapsulation
       # and was fragile.
       #
       # @param prompt [String] the follow-up user message
       # @return [RubyPi::Agent::Result] the outcome of the continued run
+      #   (usage/tool_calls_made/turns are per-invocation; messages cumulative)
       def continue(prompt)
         @state.reset_iteration!
         @state.add_message(role: :user, content: prompt)
diff --git a/lib/ruby_pi/agent/loop.rb b/lib/ruby_pi/agent/loop.rb
index b6f0d44..5e34506 100644
--- a/lib/ruby_pi/agent/loop.rb
+++ b/lib/ruby_pi/agent/loop.rb
@@ -10,6 +10,8 @@
 # is reached. It handles streaming, lifecycle events, compaction, and all
 # pre/post tool call hooks.
 
+require "json"
+
 module RubyPi
   module Agent
     # Executes the think-act-observe cycle against a given State, emitting
@@ -55,6 +57,14 @@ def initialize(state:, emitter:, compaction: nil, execution_mode: :parallel, too
         @state = state
         @emitter = emitter
         @compaction = compaction
+        # Wire the loop's emitter into the compaction strategy so the
+        # documented :compaction event actually reaches agent subscribers.
+        # Compaction#emitter defaults to nil and nothing else ever sets it —
+        # without this, `agent.on(:compaction)` never fires. An emitter that
+        # was already assigned explicitly is left untouched.
+        if @compaction.respond_to?(:emitter=) && @compaction.respond_to?(:emitter) && @compaction.emitter.nil?
+          @compaction.emitter = emitter
+        end
         @execution_mode = execution_mode
         @tool_timeout = tool_timeout
         @tool_calls_made = []
diff --git a/lib/ruby_pi/agent/state.rb b/lib/ruby_pi/agent/state.rb
index 01b804c..1dd42dd 100644
--- a/lib/ruby_pi/agent/state.rb
+++ b/lib/ruby_pi/agent/state.rb
@@ -91,6 +91,12 @@ def initialize(
 
       # Appends a message to the conversation history.
       #
+      # NOTE: history grows without bound — there is no built-in cap. Growth
+      # per run is limited by max_iterations, but long-lived agents that call
+      # continue() repeatedly (or use a high max_iterations with large tool
+      # outputs) accumulate messages linearly. Configure
+      # Agent.new(compaction: ...) to keep the context bounded.
+      #
       # @param role [Symbol, String] the message role (:user, :assistant, :system, :tool)
       # @param content [String, nil] the text content of the message
       # @param options [Hash] additional fields (e.g., :tool_call_id, :tool_calls)
diff --git a/lib/ruby_pi/configuration.rb b/lib/ruby_pi/configuration.rb
index 1460954..8a4b05f 100644
--- a/lib/ruby_pi/configuration.rb
+++ b/lib/ruby_pi/configuration.rb
@@ -37,19 +37,53 @@ class Configuration
     attr_accessor :openai_api_key
 
     # @return [Integer] Maximum number of retry attempts for transient errors (default: 3)
-    attr_accessor :max_retries
+    attr_reader :max_retries
 
     # @return [Float] Base delay in seconds for exponential backoff (default: 1.0)
-    attr_accessor :retry_base_delay
+    attr_reader :retry_base_delay
 
     # @return [Float] Maximum delay in seconds between retries (default: 30.0)
-    attr_accessor :retry_max_delay
+    attr_reader :retry_max_delay
 
     # @return [Integer] HTTP request timeout in seconds (default: 120)
-    attr_accessor :request_timeout
+    attr_reader :request_timeout
 
     # @return [Integer] HTTP connection open timeout in seconds (default: 10)
-    attr_accessor :open_timeout
+    attr_reader :open_timeout
+
+    # Validated writers for numeric settings. A negative max_retries silently
+    # disables retries and a negative delay raises deep inside the retry
+    # loop's sleep — fail fast at assignment time instead, where the typo is.
+
+    # @param value [Integer] must be a non-negative integer
+    def max_retries=(value)
+      validate_numeric!(:max_retries, value)
+      @max_retries = value
+    end
+
+    # @param value [Numeric] must be non-negative
+    def retry_base_delay=(value)
+      validate_numeric!(:retry_base_delay, value)
+      @retry_base_delay = value
+    end
+
+    # @param value [Numeric] must be non-negative
+    def retry_max_delay=(value)
+      validate_numeric!(:retry_max_delay, value)
+      @retry_max_delay = value
+    end
+
+    # @param value [Numeric] must be non-negative
+    def request_timeout=(value)
+      validate_numeric!(:request_timeout, value)
+      @request_timeout = value
+    end
+
+    # @param value [Numeric] must be non-negative
+    def open_timeout=(value)
+      validate_numeric!(:open_timeout, value)
+      @open_timeout = value
+    end
 
     # @return [String] Default model name for Gemini provider
     attr_accessor :default_gemini_model
@@ -78,6 +112,17 @@ def reset!
 
     private
 
+    # Raises unless the value is a non-negative Numeric.
+    #
+    # @param name [Symbol] the setting name (for the error message)
+    # @param value [Object] the value being assigned
+    # @raise [ArgumentError] if value is not a Numeric or is negative
+    def validate_numeric!(name, value)
+      return if value.is_a?(Numeric) && value >= 0
+
+      raise ArgumentError, "#{name} must be a non-negative number, got #{value.inspect}"
+    end
+
     # Sets all configuration ivars to their default values. Called by both
     # initialize and reset! to ensure consistent defaults without the
     # anti-pattern of calling initialize from reset!.
diff --git a/lib/ruby_pi/llm/anthropic.rb b/lib/ruby_pi/llm/anthropic.rb
index c272bdc..b161810 100644
--- a/lib/ruby_pi/llm/anthropic.rb
+++ b/lib/ruby_pi/llm/anthropic.rb
@@ -6,6 +6,8 @@
 # the Anthropic Messages API for both synchronous and streaming completions,
 # including tool_use block support.
 
+require "json"
+
 module RubyPi
   module LLM
     # Anthropic Claude provider implementation. Communicates with the Anthropic
@@ -370,12 +372,19 @@ def perform_streaming_request(body, &block)
         # process complete lines incrementally so that deltas reach the caller
         # as soon as each SSE event is fully received — not after the entire
         # response has been buffered.
-        sse_buffer = +""
+        #
+        # The buffer is BINARY because chunks arrive as ASCII-8BIT and may end
+        # mid-way through a multi-byte UTF-8 character; appending such a chunk
+        # to a UTF-8 buffer that already holds non-ASCII text raises
+        # Encoding::CompatibilityError. Each complete line is re-encoded to
+        # UTF-8 (and scrubbed) before parsing, so deltas reach the caller as
+        # valid UTF-8 strings.
+        sse_buffer = (+"").force_encoding(Encoding::BINARY)
         response_status = nil
 
         # Accumulate error response body separately so ApiError gets the
         # full body even though on_data consumed the chunks.
-        error_body = +""
+        error_body = (+"").force_encoding(Encoding::BINARY)
 
         response = with_transport_errors do
           conn.post("/v1/messages") do |req|
@@ -394,14 +403,17 @@ def perform_streaming_request(body, &block)
             # calls on_data for error responses too, which would otherwise
             # consume the body and leave response.body empty.
             if response_status && response_status >= 400
-              error_body << chunk
+              error_body << chunk.b
               next
             end
 
-            sse_buffer << chunk
-            # Process all complete lines in the buffer
+            sse_buffer << chunk.b
+            # Process all complete lines in the buffer. A complete line holds
+            # complete UTF-8 sequences (multi-byte characters split across
+            # chunks are repaired by the buffering), so re-encode it to UTF-8
+            # here; scrub guards against a server sending invalid bytes.
             while (line_end = sse_buffer.index("\n"))
-              line = sse_buffer.slice!(0, line_end + 1).strip
+              line = sse_buffer.slice!(0, line_end + 1).force_encoding(Encoding::UTF_8).scrub.strip
               next if line.empty?
               next unless line.start_with?("data: ")
 
@@ -436,12 +448,12 @@ def perform_streaming_request(body, &block)
         unless response.success?
           # Reconstruct the response body from what on_data accumulated
           error_response = response
-          error_body_str = error_body.empty? ? response.body : error_body
+          error_body_str = error_body.empty? ? response.body : error_body.force_encoding(Encoding::UTF_8).scrub
           handle_error_response(error_response, override_body: error_body_str)
         end
 
         # Process any remaining data in the buffer after the connection closes
-        sse_buffer.each_line do |line|
+        sse_buffer.force_encoding(Encoding::UTF_8).scrub.each_line do |line|
           line = line.strip
           next if line.empty?
           next unless line.start_with?("data: ")
diff --git a/lib/ruby_pi/llm/base_provider.rb b/lib/ruby_pi/llm/base_provider.rb
index 385ba2d..2afd179 100644
--- a/lib/ruby_pi/llm/base_provider.rb
+++ b/lib/ruby_pi/llm/base_provider.rb
@@ -94,7 +94,7 @@ def complete(messages:, tools: [], stream: false, &block)
           # 3 retries + 1 initial = 4 total attempts. Previously used `< @max_retries`
           # which was off-by-one (only 2 retries with max_retries: 3).
           if attempt <= @max_retries
-            delay = calculate_backoff(attempt)
+            delay = retry_delay_for(e, attempt)
             log_retry(attempt, delay, e)
             sleep(delay)
             retry
@@ -136,6 +136,29 @@ def perform_complete(messages:, tools:, stream:, &block)
         raise RubyPi::AbstractMethodError, :perform_complete
       end
 
+      # Maximum delay (seconds) honored from a server-provided Retry-After
+      # header. Caps pathological or misconfigured server values so a single
+      # 429 cannot stall the client indefinitely.
+      RETRY_AFTER_CEILING = 60.0
+
+      # Picks the delay before the next retry. A server-provided Retry-After
+      # on a 429 takes precedence over the local exponential backoff: the
+      # server knows its own cooldown window, and retrying earlier just burns
+      # the retry budget against guaranteed 429s. Retry-After parsed from an
+      # HTTP-date (rather than delta-seconds) arrives as 0.0 and falls through
+      # to the computed backoff.
+      #
+      # @param error [Exception] the error that triggered the retry
+      # @param attempt [Integer] the current attempt number (1-based)
+      # @return [Float] delay in seconds
+      def retry_delay_for(error, attempt)
+        if error.is_a?(RubyPi::RateLimitError) && error.retry_after&.positive?
+          [error.retry_after, RETRY_AFTER_CEILING].min
+        else
+          calculate_backoff(attempt)
+        end
+      end
+
       # Calculates the backoff delay for a given retry attempt using
       # exponential backoff with jitter.
       #
diff --git a/lib/ruby_pi/llm/fallback.rb b/lib/ruby_pi/llm/fallback.rb
index 0ab5638..7f36238 100644
--- a/lib/ruby_pi/llm/fallback.rb
+++ b/lib/ruby_pi/llm/fallback.rb
@@ -149,6 +149,19 @@ def perform_complete_without_streaming(messages:, tools:, stream:, &block)
       # @yield [event] the consumer's streaming block
       # @return [RubyPi::LLM::Response]
       def perform_complete_with_streaming_fallback(messages:, tools:, &block)
+        # Count the characters of text already delivered to the consumer from
+        # the primary. If the primary fails mid-stream AFTER yielding text,
+        # the fallback streams a complete fresh response — a consumer that
+        # merely appends deltas would render the primary's partial text
+        # followed by the full fallback text. The :fallback_start payload
+        # carries partial_output/partial_chars so consumers can deterministically
+        # truncate what they already rendered.
+        partial_chars = 0
+        counting_block = proc do |event|
+          partial_chars += event.data.to_s.length if event.text_delta?
+          block.call(event)
+        end
+
         begin
           # Stream primary events directly to the consumer for real-time UX.
           # No buffering — tokens appear immediately as they arrive.
@@ -156,7 +169,7 @@ def perform_complete_with_streaming_fallback(messages:, tools:, &block)
             messages: messages,
             tools: tools,
             stream: true,
-            &block
+            &counting_block
           )
 
           response
@@ -167,12 +180,17 @@ def perform_complete_with_streaming_fallback(messages:, tools:, &block)
           log_fallback(e)
 
           # Signal the consumer that the primary failed mid-stream and a
-          # fallback provider is taking over. Consumers should use this event
-          # to clear any partial output from the failed primary.
+          # fallback provider is taking over. Consumers MUST use this event
+          # to clear any partial output from the failed primary:
+          #   partial_output — true when the primary yielded any text deltas
+          #   partial_chars  — how many characters were yielded (truncate by
+          #                    this amount if appending to a shared buffer)
           block.call(StreamEvent.new(type: :fallback_start, data: {
             failed_provider: @primary.provider_name,
             error: e.message,
-            fallback_provider: @fallback.provider_name
+            fallback_provider: @fallback.provider_name,
+            partial_output: partial_chars.positive?,
+            partial_chars: partial_chars
           }))
 
           # Stream directly from the fallback to the consumer's block.
diff --git a/lib/ruby_pi/llm/gemini.rb b/lib/ruby_pi/llm/gemini.rb
index a5f6050..517ba3e 100644
--- a/lib/ruby_pi/llm/gemini.rb
+++ b/lib/ruby_pi/llm/gemini.rb
@@ -6,6 +6,7 @@
 # the Gemini REST API for both synchronous and streaming completions, including
 # tool/function calling support.
 
+require "json"
 require "securerandom"
 
 module RubyPi
@@ -305,9 +306,14 @@ def perform_streaming_request(body, &block)
         # which may split SSE events mid-line. We accumulate a line buffer and
         # process complete lines incrementally so that deltas reach the caller
         # as soon as each SSE event is fully received.
-        sse_buffer = +""
+        # BINARY buffer: chunks arrive as ASCII-8BIT and may end mid-way
+        # through a multi-byte UTF-8 character; appending such a chunk to a
+        # UTF-8 buffer holding non-ASCII text raises
+        # Encoding::CompatibilityError. Complete lines are re-encoded to
+        # UTF-8 (and scrubbed) before parsing.
+        sse_buffer = (+"").force_encoding(Encoding::BINARY)
         response_status = nil
-        error_body = +""
+        error_body = (+"").force_encoding(Encoding::BINARY)
 
         response = with_transport_errors do
           conn.post(url) do |req|
@@ -324,14 +330,17 @@ def perform_streaming_request(body, &block)
             # If the HTTP status indicates an error, accumulate the body for
             # the error handler instead of parsing it as SSE events.
             if response_status && response_status >= 400
-              error_body << chunk
+              error_body << chunk.b
               next
             end
 
-            sse_buffer << chunk
-            # Process all complete lines in the buffer
+            sse_buffer << chunk.b
+            # Process all complete lines in the buffer. A complete line holds
+            # complete UTF-8 sequences (multi-byte characters split across
+            # chunks are repaired by the buffering), so re-encode it to UTF-8
+            # here; scrub guards against a server sending invalid bytes.
             while (line_end = sse_buffer.index("\n"))
-              line = sse_buffer.slice!(0, line_end + 1).strip
+              line = sse_buffer.slice!(0, line_end + 1).force_encoding(Encoding::UTF_8).scrub.strip
               next if line.empty?
               next unless line.start_with?("data: ")
 
@@ -400,7 +409,7 @@ def perform_streaming_request(body, &block)
         # callback. Pass the accumulated error_body so ApiError carries the
         # full server message instead of an empty body.
         unless response.success?
-          error_body_str = error_body.empty? ? response.body : error_body
+          error_body_str = error_body.empty? ? response.body : error_body.force_encoding(Encoding::UTF_8).scrub
           handle_error_response(response, override_body: error_body_str)
         end
 
diff --git a/lib/ruby_pi/llm/openai.rb b/lib/ruby_pi/llm/openai.rb
index d90ccaf..d0e546c 100644
--- a/lib/ruby_pi/llm/openai.rb
+++ b/lib/ruby_pi/llm/openai.rb
@@ -6,6 +6,8 @@
 # OpenAI Chat Completions API for both synchronous and streaming completions,
 # including function/tool calling support.
 
+require "json"
+
 module RubyPi
   module LLM
     # OpenAI provider implementation. Communicates with the OpenAI Chat
@@ -318,9 +320,14 @@ def perform_streaming_request(body, &block)
         # which may split SSE events mid-line. We accumulate a line buffer and
         # process complete lines incrementally so that deltas reach the caller
         # as soon as each SSE event is fully received.
-        sse_buffer = +""
+        # BINARY buffer: chunks arrive as ASCII-8BIT and may end mid-way
+        # through a multi-byte UTF-8 character; appending such a chunk to a
+        # UTF-8 buffer holding non-ASCII text raises
+        # Encoding::CompatibilityError. Complete lines are re-encoded to
+        # UTF-8 (and scrubbed) before parsing.
+        sse_buffer = (+"").force_encoding(Encoding::BINARY)
         response_status = nil
-        error_body = +""
+        error_body = (+"").force_encoding(Encoding::BINARY)
 
         response = with_transport_errors do
           conn.post("/v1/chat/completions") do |req|
@@ -337,14 +344,17 @@ def perform_streaming_request(body, &block)
             # If the HTTP status indicates an error, accumulate the body for
             # the error handler instead of parsing it as SSE events.
             if response_status && response_status >= 400
-              error_body << chunk
+              error_body << chunk.b
               next
             end
 
-            sse_buffer << chunk
-            # Process all complete lines in the buffer
+            sse_buffer << chunk.b
+            # Process all complete lines in the buffer. A complete line holds
+            # complete UTF-8 sequences (multi-byte characters split across
+            # chunks are repaired by the buffering), so re-encode it to UTF-8
+            # here; scrub guards against a server sending invalid bytes.
             while (line_end = sse_buffer.index("\n"))
-              line = sse_buffer.slice!(0, line_end + 1).strip
+              line = sse_buffer.slice!(0, line_end + 1).force_encoding(Encoding::UTF_8).scrub.strip
               next if line.empty?
               next unless line.start_with?("data: ")
 
@@ -419,7 +429,7 @@ def perform_streaming_request(body, &block)
         # callback. Pass the accumulated error_body so ApiError carries the
         # full server message instead of an empty body.
         unless response.success?
-          error_body_str = error_body.empty? ? response.body : error_body
+          error_body_str = error_body.empty? ? response.body : error_body.force_encoding(Encoding::UTF_8).scrub
           handle_error_response(response, override_body: error_body_str)
         end
 
diff --git a/lib/ruby_pi/llm/tool_call.rb b/lib/ruby_pi/llm/tool_call.rb
index 93ea49c..ff46b6d 100644
--- a/lib/ruby_pi/llm/tool_call.rb
+++ b/lib/ruby_pi/llm/tool_call.rb
@@ -6,6 +6,8 @@
 # decides to invoke a tool, it returns one or more ToolCall objects describing
 # which function to call and with what arguments.
 
+require "json"
+
 module RubyPi
   module LLM
     # A tool call extracted from an LLM response. Contains the unique call ID,
diff --git a/lib/ruby_pi/tools/definition.rb b/lib/ruby_pi/tools/definition.rb
index 9ea0940..4e14fe0 100644
--- a/lib/ruby_pi/tools/definition.rb
+++ b/lib/ruby_pi/tools/definition.rb
@@ -37,16 +37,32 @@ class Definition
       # @return [Hash] A JSON Schema hash describing the tool's parameters.
       attr_reader :parameters
 
+      # Tool names must satisfy the strictest provider constraint (Anthropic's
+      # ^[a-zA-Z0-9_-]{1,64}$). Without this guard, a name like "send.email"
+      # registers fine and then 400s on every API request with an opaque
+      # server-side validation error that doesn't point back to the tool.
+      NAME_FORMAT = /\A[a-zA-Z0-9_-]{1,64}\z/
+
       # Creates a new tool definition.
       #
-      # @param name [String, Symbol] Unique identifier for the tool.
+      # @param name [String, Symbol] Unique identifier for the tool. Must match
+      #   NAME_FORMAT (letters, digits, underscore, hyphen; max 64 chars).
       # @param description [String] What the tool does (shown to the LLM).
       # @param category [Symbol, nil] Optional grouping category.
       # @param parameters [Hash] JSON Schema hash for the tool's input parameters.
-      # @yield [Hash] Block that implements the tool logic. Receives a hash of arguments.
-      # @raise [ArgumentError] If name or description is missing, or no block given.
+      # @yield [Hash] Block that implements the tool logic. Receives a hash of
+      #   symbol-keyed arguments, or keyword arguments if the block declares
+      #   keyword parameters (see #call).
+      # @raise [ArgumentError] If name is missing or violates NAME_FORMAT,
+      #   description is missing, or no block given.
       def initialize(name:, description:, category: nil, parameters: {}, &block)
         raise ArgumentError, "Tool name is required" if name.nil? || name.to_s.strip.empty?
+        unless name.to_s.match?(NAME_FORMAT)
+          raise ArgumentError,
+                "Tool name #{name.to_s.inspect} is invalid — provider APIs require " \
+                "names matching #{NAME_FORMAT.inspect} (letters, digits, underscore, " \
+                "hyphen; 1-64 characters)"
+        end
         raise ArgumentError, "Tool description is required" if description.nil? || description.strip.empty?
         raise ArgumentError, "Tool implementation block is required" unless block_given?
 
@@ -55,14 +71,33 @@ def initialize(name:, description:, category: nil, parameters: {}, &block)
         @category = category&.to_sym
         @parameters = parameters
         @implementation = block
+        # On Ruby 3.x a positional Hash is never auto-splatted to keywords, so
+        # a block written `{ |content:, platform:| ... }` — the natural style
+        # given named schema parameters — would fail every call with
+        # "missing keyword". Detect keyword parameters once here and splat in
+        # #call accordingly.
+        @expects_keywords = block.parameters.any? { |type, _| %i[key keyreq keyrest].include?(type) }
       end
 
       # Invokes the tool with the given arguments.
       #
+      # Blocks may be written either style:
+      #   { |args| args[:content] }            # single positional Hash
+      #   { |content:, platform: "x"| ... }    # keyword parameters
+      #
+      # When the block declares keyword parameters, the arguments hash is
+      # splatted to keywords. Note that a keyword-style block without **rest
+      # raises ArgumentError on unexpected keys — strict by design, since the
+      # keys come from the LLM.
+      #
       # @param args [Hash] The arguments to pass to the tool implementation.
       # @return [Object] Whatever the implementation block returns.
       def call(args = {})
-        @implementation.call(args)
+        if @expects_keywords
+          @implementation.call(**args)
+        else
+          @implementation.call(args)
+        end
       end
 
       # Converts this tool definition to Google Gemini function declaration format.
diff --git a/lib/ruby_pi/tools/executor.rb b/lib/ruby_pi/tools/executor.rb
index 1dc9c67..7080c1a 100644
--- a/lib/ruby_pi/tools/executor.rb
+++ b/lib/ruby_pi/tools/executor.rb
@@ -115,7 +115,12 @@ def execute_parallel(calls)
         end
 
         # Collect results, respecting the configured timeout for each future.
-        futures.map do |future|
+        # Zip each future with its originating call so failure Results carry
+        # the real tool name — with several tools timing out in parallel,
+        # "unknown" Results are indistinguishable in logs and extension events.
+        calls.zip(futures).map do |call, future|
+          tool_name = (call[:name] || call["name"]).to_s
+
           # Issue #10: Wait for the future to complete, then check its state
           # explicitly. Future#value returns nil both on timeout AND when the
           # block legitimately returned nil, so we cannot use || to distinguish.
@@ -128,13 +133,16 @@ def execute_parallel(calls)
             else
               # Future was rejected (raised an exception within the block).
               # This shouldn't normally happen since execute_single rescues
-              # internally, but handle it defensively.
+              # internally, but handle it defensively. The actual run time is
+              # unknown here (the future failed at some point before the wait
+              # elapsed), so report 0.0 rather than a misleading full-timeout
+              # duration for what may have been an instant failure.
               error = future.reason
               Result.new(
-                name: "unknown",
+                name: tool_name,
                 success: false,
                 error: "#{error.class}: #{error.message}",
-                duration_ms: @timeout * 1000.0
+                duration_ms: 0.0
               )
             end
           else
@@ -147,9 +155,9 @@ def execute_parallel(calls)
             future.cancel if future.respond_to?(:cancel)
 
             Result.new(
-              name: "unknown",
+              name: tool_name,
               success: false,
-              error: "Tool execution timed out after #{@timeout}s",
+              error: "Tool '#{tool_name}' timed out after #{@timeout}s",
               duration_ms: @timeout * 1000.0
             )
           end
diff --git a/lib/ruby_pi/tools/schema.rb b/lib/ruby_pi/tools/schema.rb
index cdb05bf..ec8d4dd 100644
--- a/lib/ruby_pi/tools/schema.rb
+++ b/lib/ruby_pi/tools/schema.rb
@@ -13,6 +13,16 @@
 # flag consumed by `.object` to populate the top-level "required" array.
 # It is stripped from the property's own schema hash before inclusion.
 #
+# IMPORTANT: Schemas are LLM-facing hints, NOT runtime input validation.
+# Nothing in the execution pipeline validates the model's arguments against
+# the schema before invoking the tool block: `required`, `enum`, `minimum`,
+# and type declarations constrain what the model is *asked* to produce, but a
+# misbehaving model can still omit required fields, send extra keys, or pass
+# a String where an Integer is declared — no coercion is performed. Tool
+# blocks should treat their arguments as untrusted input and validate or
+# coerce what they depend on. (This is deliberate, per the anti-framework
+# philosophy: validation policy belongs to the tool, not the harness.)
+#
 # Usage:
 #   schema = RubyPi::Schema.object(
 #     name: RubyPi::Schema.string("User's name", required: true),
diff --git a/lib/ruby_pi/version.rb b/lib/ruby_pi/version.rb
index 7deaa5f..d54577c 100644
--- a/lib/ruby_pi/version.rb
+++ b/lib/ruby_pi/version.rb
@@ -7,5 +7,5 @@
 
 module RubyPi
   # The current version of the RubyPi gem, following Semantic Versioning.
-  VERSION = "0.1.7"
+  VERSION = "0.1.8"
 end
diff --git a/ruby-pi.gemspec b/ruby-pi.gemspec
index 9fed46d..831cc55 100644
--- a/ruby-pi.gemspec
+++ b/ruby-pi.gemspec
@@ -51,6 +51,10 @@ Gem::Specification.new do |spec|
   # Issue #20: Removed arbitrary < 3.4 upper bound — no known incompatibility.
   spec.add_dependency "faraday-net_http", "~> 3.0"
   spec.add_dependency "concurrent-ruby",  "~> 1.2"
+  # JSON.parse/JSON.generate are used directly throughout the providers and
+  # agent loop. Declare it explicitly rather than relying on Faraday's
+  # transitive dependency (which could drop it) or the default-gem status.
+  spec.add_dependency "json",             ">= 2.0"
 
   # Development dependencies
   spec.add_development_dependency "rspec",   "~> 3.12"
diff --git a/spec/ruby_pi/fixes/review_round6_spec.rb b/spec/ruby_pi/fixes/review_round6_spec.rb
new file mode 100644
index 0000000..d587eef
--- /dev/null
+++ b/spec/ruby_pi/fixes/review_round6_spec.rb
@@ -0,0 +1,288 @@
+# frozen_string_literal: true
+
+# spec/ruby_pi/fixes/review_round6_spec.rb
+#
+# Regression specs for adversarial review round 6:
+#
+# 1. BaseProvider honors a server-provided Retry-After on 429 instead of
+#    always using local exponential backoff.
+# 2. Parallel executor timeout/rejection Results carry the real tool name
+#    (previously hardcoded "unknown").
+# 3. Tool blocks declared with keyword parameters work (previously every
+#    call raised "missing keyword").
+# 4. Tool names are validated against provider constraints at definition time.
+# 5. The :compaction event emitter is wired by the Loop, so subscribers
+#    actually receive the documented event.
+# 6. The :fallback_start payload reports how much primary text was already
+#    yielded (partial_output / partial_chars).
+# 7. Configuration rejects negative numeric settings at assignment time.
+# 8. Streaming text deltas are valid UTF-8 even though chunks arrive as
+#    binary, and the SSE buffer survives multi-byte characters.
+# 9. Every lib file referencing JSON requires "json" itself (composability).
+
+require "spec_helper"
+
+RSpec.describe "Review round 6 fixes" do
+  describe "1. Retry-After is honored on 429" do
+    let(:provider) { RubyPi::LLM::Anthropic.new(model: "claude-test") }
+
+    it "uses the server's Retry-After as the retry delay" do
+      error = RubyPi::RateLimitError.new("rate limited", retry_after: 7.5)
+      expect(provider.send(:retry_delay_for, error, 1)).to eq(7.5)
+    end
+
+    it "caps a pathological Retry-After at the ceiling" do
+      error = RubyPi::RateLimitError.new("rate limited", retry_after: 86_400.0)
+      expect(provider.send(:retry_delay_for, error, 1))
+        .to eq(RubyPi::LLM::BaseProvider::RETRY_AFTER_CEILING)
+    end
+
+    it "falls back to exponential backoff when Retry-After is absent" do
+      error = RubyPi::RateLimitError.new("rate limited", retry_after: nil)
+      delay = provider.send(:retry_delay_for, error, 1)
+      # backoff = base * 2^0 + jitter, capped at retry_max_delay
+      expect(delay).to be_between(0, RubyPi.configuration.retry_max_delay)
+    end
+
+    it "falls back to backoff when Retry-After was an HTTP-date (parses to 0.0)" do
+      error = RubyPi::RateLimitError.new("rate limited", retry_after: 0.0)
+      expect(provider).to receive(:calculate_backoff).with(2).and_return(0.01)
+      expect(provider.send(:retry_delay_for, error, 2)).to eq(0.01)
+    end
+
+    it "uses backoff (not Retry-After) for non-rate-limit errors" do
+      error = RubyPi::ApiError.new("boom", status_code: 500)
+      expect(provider).to receive(:calculate_backoff).with(1).and_return(0.01)
+      expect(provider.send(:retry_delay_for, error, 1)).to eq(0.01)
+    end
+
+    it "sleeps for the Retry-After duration when retrying a real 429" do
+      stub_request(:post, "https://api.anthropic.com/v1/messages")
+        .to_return(
+          { status: 429, headers: { "Retry-After" => "0.02" }, body: "rate limited" },
+          {
+            status: 200,
+            headers: { "Content-Type" => "application/json" },
+            body: JSON.generate(
+              id: "msg_1", type: "message", role: "assistant",
+              content: [{ type: "text", text: "ok" }],
+              stop_reason: "end_turn",
+              usage: { input_tokens: 1, output_tokens: 1 }
+            )
+          }
+        )
+
+      expect(provider).to receive(:sleep).with(0.02)
+      response = provider.complete(messages: [{ role: :user, content: "hi" }])
+      expect(response.content).to eq("ok")
+    end
+  end
+
+  describe "2. Parallel executor failure Results carry the tool name" do
+    let(:registry) { RubyPi::Tools::Registry.new }
+
+    it "reports the real tool name when a parallel tool times out" do
+      registry.register(
+        RubyPi::Tool.define(name: "slow_tool", description: "sleeps") { |_args| sleep(1) }
+      )
+      registry.register(
+        RubyPi::Tool.define(name: "fast_tool", description: "returns") { |_args| "done" }
+      )
+
+      executor = RubyPi::Tools::Executor.new(registry, mode: :parallel, timeout: 0.05)
+      results = executor.execute([
+                                   { name: "slow_tool", arguments: {} },
+                                   { name: "fast_tool", arguments: {} }
+                                 ])
+
+      timed_out = results[0]
+      expect(timed_out.success?).to be(false)
+      expect(timed_out.name).to eq("slow_tool")
+      expect(timed_out.error).to include("slow_tool")
+      expect(timed_out.error).to include("timed out")
+
+      expect(results[1].name).to eq("fast_tool")
+      expect(results[1].success?).to be(true)
+    end
+  end
+
+  describe "3. Keyword-parameter tool blocks" do
+    it "splats arguments to a block with required keywords" do
+      tool = RubyPi::Tool.define(name: "kw_tool", description: "kw") do |content:|
+        content.upcase
+      end
+      expect(tool.call(content: "hi")).to eq("HI")
+    end
+
+    it "splats arguments to a block with optional keywords and **rest" do
+      tool = RubyPi::Tool.define(name: "kw_rest", description: "kw") do |content:, platform: "x", **rest|
+        [content, platform, rest]
+      end
+      expect(tool.call(content: "a", extra: 1)).to eq(["a", "x", { extra: 1 }])
+    end
+
+    it "still passes a single positional Hash to positional-style blocks" do
+      tool = RubyPi::Tool.define(name: "pos_tool", description: "pos") do |args|
+        args[:content]
+      end
+      expect(tool.call(content: "hello")).to eq("hello")
+    end
+  end
+
+  describe "4. Tool name validation" do
+    it "rejects names with characters providers refuse" do
+      ["send.email", "create post!", "a b", "tool/x"].each do |bad|
+        expect do
+          RubyPi::Tool.define(name: bad, description: "d") { |a| a }
+        end.to raise_error(ArgumentError, /invalid/)
+      end
+    end
+
+    it "rejects names longer than 64 characters" do
+      expect do
+        RubyPi::Tool.define(name: "a" * 65, description: "d") { |a| a }
+      end.to raise_error(ArgumentError, /invalid/)
+    end
+
+    it "accepts provider-safe names" do
+      %w[send_email tool-1 Get_Analytics2].each do |good|
+        expect do
+          RubyPi::Tool.define(name: good, description: "d") { |a| a }
+        end.not_to raise_error
+      end
+    end
+  end
+
+  describe "5. :compaction event emitter wiring" do
+    let(:model) { double("model") }
+    let(:emitter) { Class.new { include RubyPi::Agent::EventEmitter }.new }
+    let(:state) do
+      RubyPi::Agent::State.new(system_prompt: "test", model: model)
+    end
+
+    it "wires the loop's emitter into the compaction strategy" do
+      compaction = RubyPi::Context::Compaction.new(summary_model: model)
+      expect(compaction.emitter).to be_nil
+
+      RubyPi::Agent::Loop.new(state: state, emitter: emitter, compaction: compaction)
+      expect(compaction.emitter).to be(emitter)
+    end
+
+    it "does not overwrite an explicitly assigned emitter" do
+      compaction = RubyPi::Context::Compaction.new(summary_model: model)
+      custom = double("custom emitter")
+      compaction.emitter = custom
+
+      RubyPi::Agent::Loop.new(state: state, emitter: emitter, compaction: compaction)
+      expect(compaction.emitter).to be(custom)
+    end
+  end
+
+  describe "6. :fallback_start reports partial output" do
+    let(:primary) { double("primary", provider_name: :primary, model_name: "p") }
+    let(:backup) { double("backup", provider_name: :backup, model_name: "b") }
+    let(:provider) { RubyPi::LLM::Fallback.new(primary: primary, fallback: backup) }
+    let(:fallback_response) do
+      RubyPi::LLM::Response.new(content: "ok", tool_calls: [], usage: {}, finish_reason: "stop")
+    end
+
+    before do
+      allow(backup).to receive(:complete) do |**_args, &block|
+        block.call(RubyPi::LLM::StreamEvent.new(type: :text_delta, data: "ok"))
+        fallback_response
+      end
+    end
+
+    it "counts the characters already yielded by the failed primary" do
+      allow(primary).to receive(:complete) do |**_args, &block|
+        block.call(RubyPi::LLM::StreamEvent.new(type: :text_delta, data: "The answer "))
+        block.call(RubyPi::LLM::StreamEvent.new(type: :text_delta, data: "is"))
+        raise RubyPi::ApiError.new("primary died mid-stream", status_code: 500)
+      end
+
+      events = []
+      provider.complete(messages: [], tools: [], stream: true) { |e| events << e }
+
+      start = events.find { |e| e.type == :fallback_start }
+      expect(start.data[:partial_output]).to be(true)
+      expect(start.data[:partial_chars]).to eq("The answer is".length)
+    end
+
+    it "reports no partial output when the primary failed before yielding text" do
+      allow(primary).to receive(:complete).and_raise(
+        RubyPi::ApiError.new("immediate failure", status_code: 500)
+      )
+
+      events = []
+      provider.complete(messages: [], tools: [], stream: true) { |e| events << e }
+
+      start = events.find { |e| e.type == :fallback_start }
+      expect(start.data[:partial_output]).to be(false)
+      expect(start.data[:partial_chars]).to eq(0)
+    end
+  end
+
+  describe "7. Configuration validates numeric settings" do
+    let(:config) { RubyPi::Configuration.new }
+
+    it "rejects negative values" do
+      expect { config.max_retries = -1 }.to raise_error(ArgumentError, /max_retries/)
+      expect { config.retry_base_delay = -0.5 }.to raise_error(ArgumentError, /retry_base_delay/)
+      expect { config.retry_max_delay = -1 }.to raise_error(ArgumentError, /retry_max_delay/)
+      expect { config.request_timeout = -10 }.to raise_error(ArgumentError, /request_timeout/)
+      expect { config.open_timeout = -1 }.to raise_error(ArgumentError, /open_timeout/)
+    end
+
+    it "rejects non-numeric values" do
+      expect { config.max_retries = "three" }.to raise_error(ArgumentError, /max_retries/)
+    end
+
+    it "accepts zero and positive values" do
+      expect { config.max_retries = 0 }.not_to raise_error
+      expect { config.retry_base_delay = 0.01 }.not_to raise_error
+      expect(config.retry_base_delay).to eq(0.01)
+    end
+  end
+
+  describe "8. Streaming UTF-8 safety" do
+    let(:provider) { RubyPi::LLM::Anthropic.new(model: "claude-test") }
+    let(:emoji_text) { "Hello 👋 wörld" }
+    let(:sse_body) do
+      [
+        %(data: {"type":"message_start","message":{"usage":{"input_tokens":5}}}\n\n),
+        %(data: {"type":"content_block_start","index":0,"content_block":{"type":"text"}}\n\n),
+        %(data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":#{JSON.generate(emoji_text)}}}\n\n),
+        %(data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":3}}\n\n),
+        %(data: {"type":"message_stop"}\n\n)
+      ].join.b # deliver as binary, like a real socket
+    end
+
+    it "yields multi-byte text deltas as valid UTF-8" do
+      stub_request(:post, "https://api.anthropic.com/v1/messages")
+        .to_return(status: 200, headers: { "Content-Type" => "text/event-stream" }, body: sse_body)
+
+      deltas = []
+      response = provider.complete(messages: [{ role: :user, content: "hi" }], stream: true) do |event|
+        deltas << event.data if event.text_delta?
+      end
+
+      text = deltas.join
+      expect(text).to eq(emoji_text)
+      expect(text.encoding).to eq(Encoding::UTF_8)
+      expect(text.valid_encoding?).to be(true)
+      expect(response.content).to eq(emoji_text)
+    end
+  end
+
+  describe "9. JSON is required where it is used" do
+    it "every lib file that references JSON requires json itself" do
+      lib_root = File.expand_path("../../../lib", __dir__)
+      offenders = Dir[File.join(lib_root, "**", "*.rb")].select do |file|
+        src = File.read(file)
+        src.match?(/\bJSON\.(parse|generate)/) && !src.match?(/^require "json"$/)
+      end
+      expect(offenders).to be_empty,
+                           "files use JSON without requiring it: #{offenders.inspect}"
+    end
+  end
+end