fix: recover & rotate endpoints when an RPC endpoint misbehaves#750
Open
sinzii wants to merge 2 commits into
Open
fix: recover & rotate endpoints when an RPC endpoint misbehaves#750sinzii wants to merge 2 commits into
sinzii wants to merge 2 commits into
Conversation
With a multi-endpoint WsProvider, if the endpoint picked at connect time was unreachable/misbehaving, the client crashed with `Error: [object Object]` and stopped instead of retrying a different endpoint. - Wrap socket errors in a WsConnectionError (with endpoint context) instead of emitting the raw WebSocket Event. - connect() no longer fails fast on transient provider errors; it rejects only on init errors or MaxRetryAttemptedError, letting the provider rotate. - Escalate init failures to an endpoint switch (disconnect(true)) when retry is enabled; JsonRpcV2NotSupportedError still rejects (preserves v2->legacy fallback). - ChainHead stop: bounded re-follow retries, then switch endpoint. - Arm the staling watchdog right after init, not only on the next block. - Provider hardening: immediate retries count toward maxRetryAttempts; attempt counter resets only after a healthy connection; array failover remembers all recently-failed endpoints; new connectTimeoutMs (default 30s) force-closes a stalled handshake so reconnection rotates onward. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
On a stop event the #recovering deferred is rejected when re-follow fails, but it only has an awaiting consumer when an operation is in flight (#ensureFollowed). When recovery fails with no pending request, the rejection had no handler and surfaced as an unhandled rejection. Attach a no-op catch on creation; real awaiters still receive the rejection through their own await. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
With a multi-endpoint
WsProvider, if the endpoint picked at connect time is unreachable/misbehaving, the client crashed withError: [object Object]and stopped instead of retrying a different endpoint. Reported viaexamples/scripts/reconnection.ts.Root cause
Endpoint rotation itself worked. The failure was one layer up:
WsProvideremitted the raw WebSocketEvent(not anError) on socket errors, andBaseSubstrateClientrejected the pendingconnect()on the first error — killing the client while the provider was about to rotate. A few related gaps also left a client stuck on a connected-but-broken endpoint.Changes
WsConnectionError(with endpoint context) instead of emitting a rawEvent.connect()no longer fails fast on transient provider errors; it rejects only on init errors orMaxRetryAttemptedError, letting the provider rotate.disconnect(true)) when retry is enabled;JsonRpcV2NotSupportedErrorstill rejects so the v2→legacy fallback is preserved.stop: bounded re-follow retries, then switch endpoint instead of dead-ending.maxRetryAttempts; attempt counter resets only after a connection proves healthy (grace window / first message); array failover remembers all recently-failed endpoints; newconnectTimeoutMs(default 30s) force-closes a stalled handshake so reconnection rotates onward.Behavior changes to note
maxRetryAttemptsnow also counts immediate retries (e.g.disconnect(true)switches).connect()no longer fail fast when retry is enabled — they rotate endpoints (bounded bymaxRetryAttemptsif set).Testing
WsConnectionError, rotates to a working endpoint, and returns the correct genesis hash — no crash.🤖 Generated with Claude Code