Skip to content

Keep WebSocket reconnects alive#2513

Open
hogeheer499-commits wants to merge 1 commit intopingdotgg:mainfrom
hogeheer499-commits:fix/continuous-websocket-reconnect
Open

Keep WebSocket reconnects alive#2513
hogeheer499-commits wants to merge 1 commit intopingdotgg:mainfrom
hogeheer499-commits:fix/continuous-websocket-reconnect

Conversation

@hogeheer499-commits
Copy link
Copy Markdown

@hogeheer499-commits hogeheer499-commits commented May 5, 2026

What changed

  • Change the WebSocket RPC retry schedule from a fixed 7-retry budget to continuous retries.
  • Keep the existing exponential backoff and 64s maximum delay.
  • Update reconnect state/label logic so an unbounded retry loop shows Attempt N instead of Attempt N/max.
  • Add tests that prove reconnect scheduling continues past the old retry budget.

Why this should exist

This is a small reliability fix.

T3 Code is often used through LAN, tailnet, SSH-forwarded, or other remote endpoints. A temporary server restart or private-network drop should not leave an already connected client permanently stopped after a fixed retry budget. With the previous Schedule.recurs(7) policy, the client could exhaust reconnect attempts and require focus/online/manual retry before trying again.

Continuous retry is still bounded by the existing backoff cap, so repeated failures do not spin aggressively.

Important scope clarification from later incident debugging: this PR is not presented as the root-cause fix for one local :3777 reproducer where a custom hotpatch proxy healthcheck was restarting the proxy process. That local loop was fixed outside this repository. This PR only addresses the upstream web client's behavior after a real WebSocket/server/network drop has already happened.

Scope

Validation

  • bun run --filter @t3tools/web test src/rpc/wsTransport.test.ts - 25/25 passed
  • bun run --filter @t3tools/web test src/rpc/wsConnectionState.test.ts src/components/WebSocketConnectionSurface.logic.test.ts - 10/10 passed
  • bun run --filter @t3tools/web test - 96 files, 995 tests passed
  • bun run fmt - passed
  • bun run lint - 0 errors, existing warnings only
  • bun run typecheck - 12/12 packages passed, existing Effect language-service messages only

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b5be5087-add9-4c9c-9c7e-7c92f9ec9b0f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added vouch:unvouched PR author is not yet trusted in the VOUCHED list. size:S 10-29 changed lines (additions + deletions). labels May 5, 2026
@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp Bot commented May 5, 2026

Approvability

Verdict: Needs human review

Changes WebSocket reconnection from limited retries (~8 attempts) to infinite retries via Schedule.forever, which is a significant runtime behavior change affecting resource usage and connection resilience patterns.

You can customize Macroscope's approvability policy. Learn more.

@hogeheer499-commits
Copy link
Copy Markdown
Author

Adding context for the reconnect behavior change flagged by Macroscope.

The intent is not to create an aggressive retry loop. The retry cadence still uses the existing exponential backoff and remains capped at WS_RECONNECT_MAX_DELAY_MS (64_000 ms), so a persistently unavailable backend settles at roughly one reconnect attempt per minute for a single browser session. The transport also closes the previous session before replacing it on explicit reconnect, and the tests cover stale lifecycle events so old sockets do not keep mutating connection state.

Why I think this is worth human review rather than an automatic approval-only change: T3 Code is often used through LAN, tailnet, SSH-forwarded, or desktop-managed remote environments. In those setups, a server restart or short network interruption should recover without requiring the user to focus the tab or manually retry after the previous 8-attempt budget has been exhausted.

Verification relevant to this risk:

  • wsTransport.test.ts now proves a 9th WebSocket attempt is created after the old retry budget would have stopped.
  • bun run --filter @t3tools/web test passed: 96 files, 995 tests.
  • bun run typecheck passed: 12/12 packages.

If maintainers prefer a bounded version, I can adjust this to a larger finite retry window instead of Schedule.forever; I chose the capped continuous loop because it matches the expected behavior of a desktop-style app connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S 10-29 changed lines (additions + deletions). vouch:unvouched PR author is not yet trusted in the VOUCHED list.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants