Recover from transient stream failures: provider auto-retry + error/retry UI#135
Open
oratis wants to merge 2 commits into
Open
Recover from transient stream failures: provider auto-retry + error/retry UI#135oratis wants to merge 2 commits into
oratis wants to merge 2 commits into
Conversation
The Anthropic SDK throws "request ended without sending any chunks" when a streaming response opens (HTTP 200) but the SSE body closes before any event arrives — commonly an HTTP proxy (Clash, one-api relay) tearing down an idle CONNECT tunnel during a long time-to-first-byte on a large request. The SDK's own retries don't cover it: the error is thrown while iterating the stream, after the request already succeeded, so a momentary blip reached the user as a hard error and aborted the turn (idle reflection silently gave up too). Add a shared withStreamRetry() helper and wire it into all three providers (anthropic, openai, gemini). It retries this class of error plus connection resets / premature closes, but only while no delta has been forwarded yet — so already-streamed output is never duplicated, and a user abort is never retried. Co-Authored-By: Claude Opus 4.8 <[email protected]>
Replace the bare "[error] …" line with an .err-block that shows the error detail and a ↻ retry button which re-runs the same message + files. Covers the cases provider-level auto-retry can't: a non-retryable error, or auto-retry exhausted. send() is split into send() (input/bubble/attachments) and runChat() (fetch/stream) so retry never re-appends the user bubble or re-reads the already-cleared attachment tray. Also fixes two issues this surfaced: - The same error rendered twice: runAgent emits an error event and rethrows, and the server's turn catch then sent a second identical event. Dedupe both server-side (errorSent guard) and client-side (one block per turn). - Retry could duplicate the user message in the session file: it was persisted before the provider call, so an immediate stream failure orphaned it on disk and the retry wrote it again. Persist lazily — only after the first provider response commits the turn (file order stays user→assistant). Snapshot (lisa-html-snapshot.test.ts) recomputed for the new markup/CSS. Co-Authored-By: Claude Opus 4.8 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Diagnosed a local Lisa failure: a chat turn died with
[error] request ended without sending any chunks(twice), and the same error hit[idle]reflection inbackend.log. The Anthropic SDK throws this when a streaming response opens (HTTP 200) but the SSE body closes before any event arrives — here, the Clash/mihomo proxy (127.0.0.1:7897) tearing down an idle CONNECT tunnel during a long time-to-first-byte. The SDK's own retries don't cover it (thrown while iterating a stream that already "succeeded"), so a momentary blip became a hard, user-visible error with no recovery.Two complementary layers of fix:
1. Provider auto-retry (silent, proxy-agnostic) —
fix(providers)withStreamRetry()(stream-retry.ts) wired into all three providers (anthropic / openai / gemini).2. Error detail + retry button (manual fallback) —
feat(web)[error] …line with an.err-block: error detail + ↻ 重试 button that re-runs the same message + files. Covers what auto-retry can't (non-retryable errors, or retries exhausted).runAgentemits an error event and rethrows, and the server catch sent a second identical event (this is why the screenshot showed it twice). Deduped server-side (errorSent) and client-side.Testing
withStreamRetry/isRetryableStreamErrorunit tests, Anthropic provider retry integration tests, and arunAgenttest asserting zero persistence when the first provider call throws.html-syntax.test.tsconfirms the inline<script>still parses; snapshot recomputed.src/cli/pair.test.ts, a pre-existingqrcode-terminalmodule-not-found in this checkout — zero overlap with these changes.🤖 Generated with Claude Code