Skip to content

Give each subscription connection its own bootstrap waiter#156

Merged
ghostdogpr merged 1 commit into
mainfrom
fix/subscription-bootstrap-waiter-race
Jul 3, 2026
Merged

Give each subscription connection its own bootstrap waiter#156
ghostdogpr merged 1 commit into
mainfrom
fix/subscription-bootstrap-waiter-race

Conversation

@ghostdogpr

Copy link
Copy Markdown
Owner

Two establish() calls can run concurrently in SubscriptionConnection — a pending attemptReconnect (state Reconnecting) plus a fresh attach after closeOwned reset the state to Idle. Both bootstraps wrote the single shared bootstrapWaiter field, and the shared onFrame routed each connection's HELLO reply to whichever waiter was set last. So one connection's reply could complete the other's waiter — a spurious NotConnected from a valid subscribe, and a full connect-timeout stall on the cross-completed one.

Fix: move bootstrapWaiter onto Conn and give each Conn its own onFrame closure (factory(frame => onFrame(this, frame), …)), so a reply routes to the waiter of the connection it actually arrived on. This removes the shared routing state by construction. The unexpected-error drop (added in #155) now closes the specific connection that produced the error rather than current.

No dedicated test: a faithful reproduction needs two concurrent establishes on real threads with deferred bootstrap replies, and which of the two racing establishes gets cross-completed is nondeterministic — the symptom can't be asserted without a flaky, timing-dependent test. The fix is structural (a shared field becomes per-instance). The full sage.client.internal suite (141 tests) passes across all six backends; scalafmt clean.

Two establish() calls can run concurrently in SubscriptionConnection — a
pending attemptReconnect (state Reconnecting) plus a fresh attach after
closeOwned reset the state to Idle. Both bootstraps wrote the single shared
bootstrapWaiter field, and the shared onFrame routed each connection's HELLO
reply to whichever waiter was set last, so one connection's reply could
complete the other's bootstrap: a spurious NotConnected from a valid subscribe
and a full connect-timeout stall on the cross-completed one.

Move bootstrapWaiter onto Conn and give each Conn its own onFrame closure, so a
reply routes to the waiter of the connection it arrived on. The unexpected-error
drop now closes the specific connection that produced the error rather than
current.
@ghostdogpr ghostdogpr merged commit db9d638 into main Jul 3, 2026
9 checks passed
@ghostdogpr ghostdogpr deleted the fix/subscription-bootstrap-waiter-race branch July 3, 2026 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant