Skip to content

fix: fix devtunnel WebSocket disconnects and reconnect reliability#2547

Open
IcyHot09 wants to merge 6 commits intopingdotgg:mainfrom
IcyHot09:fix/websocket-reconnect-reliability
Open

fix: fix devtunnel WebSocket disconnects and reconnect reliability#2547
IcyHot09 wants to merge 6 commits intopingdotgg:mainfrom
IcyHot09:fix/websocket-reconnect-reliability

Conversation

@IcyHot09
Copy link
Copy Markdown

@IcyHot09 IcyHot09 commented May 6, 2026

What Changed

  • Remove the 7-retry reconnect cap — client now retries indefinitely with exponential backoff capping at 64s between attempts, instead of entering a permanent "exhausted" state
  • Add visibility guard on ping timeout to suppress false positives when the tab is backgrounded
  • Add Stream.throttle to terminal, git progress, and shell snapshot streams to reduce burst rate
  • Add congestion-aware 8s delay and visibility-wake on reconnect
  • Fix visibility-wake to use a Set so all concurrent subscriptions are woken, not just the last one
  • Enable perMessageDeflate WebSocket compression via @effect/platform-bun patch (60-80% payload reduction)

Why

When accessing T3 remotely via Microsoft devtunnel, the WebSocket connection
disconnects and reconnects every 2-3 seconds during normal use. Opening sessions
with large chat histories makes it worse — the initial snapshot load can sustain
the disconnect for 20+ seconds.

Root cause: devtunnel relays traffic over an SSH channel with a hardcoded 1 MB
window. Any burst of outbound data (terminal output, session snapshots, git
progress events) fills the window faster than the client can drain it. This
stalls all WebSocket frames including heartbeat ping/pong, triggering a timeout
disconnect. After 7 retries (~127s) the client entered a permanent "exhausted"
state and stopped trying — requiring a manual page refresh.

The throttling and compression reduce sustained burst rates below the saturation
threshold. The unlimited-retry fix ensures recovery when a burst does cause a
disconnect.

Checklist

  • This PR is small and focused
  • I explained what changed and why

Note

Medium Risk
Changes core WebSocket reconnect/backoff and subscription retry behavior, which can affect connectivity and resource usage under failure conditions. Server-side throttling and compression also alter runtime performance characteristics and could surface regressions in stream delivery timing.

Overview
Improves WebSocket stability under high-throughput conditions by throttling several high-volume server streams (orchestration shell snapshot/events, git action progress, terminal events) to smooth bursts.

On the client, removes the capped retry/exhausted reconnect state and switches protocol retry policy to retry forever, with updated state/tests and UI toasts (no max-attempt display, no exhausted toast).

Adds guards to reduce false disconnects and speed recovery: suppresses heartbeat timeout handling when the tab is hidden, and updates WsTransport subscription retries with a congestion-aware minimum delay plus a visibilitychange wake-up. Enables WebSocket perMessageDeflate via an @effect/platform-bun patch (and bumps package versions/lockfile accordingly).

Reviewed by Cursor Bugbot for commit b318893. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Fix WebSocket disconnect/no-reconnect loop when loading large sessions

  • Removes the maximum reconnect attempt cap, making WsTransport retry indefinitely using exponential backoff instead of entering an exhausted/dead state
  • Throttles high-volume server-side streams (orchestration at 50/100ms, thread events at 10/100ms, terminal events at 20/50ms) to prevent flooding the WebSocket during large session loads
  • Adds a visibilitychange listener so reconnect sleeps wake immediately when the tab becomes visible again
  • Ignores heartbeat ping timeouts while the document is hidden, preventing spurious disconnects in background tabs
  • Enables perMessageDeflate on the Bun WebSocket server via a patch to reduce wire payload size
  • Removes the 'exhausted' reconnect phase from UI state; the connection now stays in 'waiting' and retries indefinitely with a simplified toast UI

Macroscope summarized b318893.

IcyHot09 and others added 5 commits May 5, 2026 17:06
Remove the 7-retry cap (Schedule.recurs → Schedule.forever) so the
WebSocket protocol never enters the "exhausted" dead state. Add a
visibility guard in onPingTimeout to suppress false-positive heartbeat
timeouts when the browser tab is backgrounded. Remove the "exhausted"
phase from WsReconnectPhase and all related UI logic.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…orm-bun patch

Reduces JSON payload size 60-80% to prevent SSH window saturation through
devtunnel. Adds @effect/platform-bun patch that sets perMessageDeflate: true
in the Bun.serve() websocket handler.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@github-actions github-actions Bot added vouch:unvouched PR author is not yet trusted in the VOUCHED list. size:L 100-499 changed lines (additions + deletions). labels May 6, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0a5e994b-b82b-4907-8a7f-cc210d59bf4d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6f364eb. Configure here.

Comment thread apps/web/src/rpc/wsTransport.ts Outdated
@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp Bot commented May 6, 2026

Approvability

Verdict: Needs human review

This PR fundamentally changes WebSocket reconnection behavior from finite retries to infinite, adds stream throttling, visibility-based reconnect logic, and enables WebSocket compression via a vendor patch. These are significant runtime behavior changes that warrant careful human review despite the fix-framed title.

You can customize Macroscope's approvability policy. Learn more.

…ibility

Replaces the single _wakeReconnect slot with a Set so every subscription
sleeping through a congestion delay gets woken when the tab becomes visible,
not just the last one to register.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@IcyHot09 IcyHot09 changed the title fix: prevent WebSocket disconnect/no-reconnect loop when loading large sessions fix: fix devtunnel WebSocket disconnects and reconnect reliability May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L 100-499 changed lines (additions + deletions). vouch:unvouched PR author is not yet trusted in the VOUCHED list.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant