Skip to content

fix(client): await reconnect POST before flushing pending sends#392

Open
dimakis wants to merge 10 commits into
mainfrom
fix/sse-reconnect-flush-race
Open

fix(client): await reconnect POST before flushing pending sends#392
dimakis wants to merge 10 commits into
mainfrom
fix/sse-reconnect-flush-race

Conversation

@dimakis

@dimakis dimakis commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Summary

  • Pending sends and _open were firing immediately after doPost('reconnect', ...) without awaiting, causing user messages queued during tool execution to race the server-side reconnect setup
  • Messages sent while a tool was executing (e.g. "?" to interrupt) were silently dropped
  • Uses .finally() to ensure flush only happens after reconnect POST completes

Test plan

  • Send a message while agent is executing a long-running Bash command
  • Verify the message interrupts/gets processed instead of hanging
  • Verify normal (non-reconnect) message flow still works

🤖 Generated with Claude Code

dimakis and others added 2 commits June 12, 2026 22:32
…reattach

EventSource auto-reconnect reuses the original URL, which never included
the ?sessions= query param. This meant the server never ran handleReconnect
on auto-reconnect — no watch, no reattach, no event replay. Users had to
send multiple messages before the session would respond.

Move reconnect from URL query param to explicit POST in the welcome handler.
This fires on every reconnect (auto or explicit), ensuring the server always
runs the full reconnect flow: watch + reattach + event replay.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The welcome handler was firing flushPendingSends() and _open immediately
after doPost('reconnect', ...) without awaiting. This caused user messages
queued during tool execution to race the server-side reconnect setup
(watch + reattach + replay), resulting in messages being silently dropped.

Use .finally() to ensure pending sends only flush after the reconnect
POST completes, matching the server's session reattach lifecycle.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 3 issue(s) (1 critical) (1 warning).

packages/client/src/sse-connection.ts

The fix has the right idea (POST-based reconnect, deferred flush), but _connected = true is set before the POST completes, opening a race window that defeats the stated goal — the store can send() during the gap because it gates on isConnected(), not _open.

  • 🔴 bugs (L216): _connected = true is set before the reconnect POST completes. The store layer calls connection.send() gated only on isConnected() (i.e. _connected), not on _open. This opens a race window where external send() calls bypass the pending queue and fire before the reconnect POST reaches the server — defeating the fix this PR intends. doPost() only requires _connectionId, not _connected, so the reconnect POST itself would still work. Move this._connected = true into both branches: inside the .finally() callback for the reconnect path, and before flushPendingSends() in the else path. [fixable]

packages/client/src/__tests__/sse-connection.test.ts

The fix has the right idea (POST-based reconnect, deferred flush), but _connected = true is set before the POST completes, opening a race window that defeats the stated goal — the store can send() during the gap because it gates on isConnected(), not _open.

  • 🟡 missing_tests: The PR title is "await reconnect POST before flushing pending sends" but no test verifies the ordering — i.e., that pending sends queued before reconnect are flushed only after the reconnect POST resolves. A test should queue a send, trigger reconnect + welcome, and assert that the queued send's fetch call happens after the reconnect POST's fetch call (e.g., by checking mockFetch call order or using a deferred promise for the reconnect POST). [fixable]
  • 🔵 missing_tests: No test covers the .finally() error path: if the reconnect POST fails (fetch rejects), flushPendingSends() and _open should still fire. This is the whole reason .finally() was chosen over .then(), but there's no test proving it works. [fixable]

Comment thread packages/client/src/sse-connection.ts Outdated
@@ -217,10 +214,29 @@ export class SseConnection implements ChatConnection {
}
this._connectionId = msg.connectionId as string;
this._connected = true;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 bugs: _connected = true is set before the reconnect POST completes. The store layer calls connection.send() gated only on isConnected() (i.e. _connected), not on _open. This opens a race window where external send() calls bypass the pending queue and fire before the reconnect POST reaches the server — defeating the fix this PR intends. doPost() only requires _connectionId, not _connected, so the reconnect POST itself would still work. Move this._connected = true into both branches: inside the .finally() callback for the reconnect path, and before flushPendingSends() in the else path. [fixable]

Addresses Centaur review: _connected = true was set before the reconnect
POST, allowing external send() calls to bypass the pending queue and fire
before the server reattached the session — defeating the flush ordering fix.

Move _connected into both branches so it's only set when it's safe to
send directly.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 5 issue(s) (2 warning).

packages/client/src/__tests__/sse-connection.test.ts

Sound fix for the SSE reconnect race condition. The .finally() approach correctly ensures flush waits for the POST, but a disconnect-during-POST race can leave the connection in an incorrect _connected = true state — that's the one actionable bug. Test coverage should be strengthened to pin the deferred-state behavior that is the PR's core change.

  • 🟡 missing_tests: The core behavior change — deferring _connected and flushPendingSends until the reconnect POST resolves — is not directly tested. There should be a test that verifies: (1) isConnected() returns false between welcome and reconnect POST completion, (2) send() queues messages during that window instead of posting directly, and (3) _open listener fires only after the POST resolves. The existing 'sends reconnect POST on welcome' test verifies the POST is issued but not the deferred state transition that is the PR's stated fix. [fixable]
  • 🔵 missing_tests: No test covers the reconnect POST failure path. Since doPost swallows errors, .finally() always runs, but a test should verify that _connected still becomes true and flushPendingSends still fires even when the reconnect POST returns a non-ok response or the fetch rejects. This is the happy consequence of using .finally() over .then(), and it's worth pinning. [fixable]

packages/client/src/sse-connection.ts

Sound fix for the SSE reconnect race condition. The .finally() approach correctly ensures flush waits for the POST, but a disconnect-during-POST race can leave the connection in an incorrect _connected = true state — that's the one actionable bug. Test coverage should be strengthened to pin the deferred-state behavior that is the PR's core change.

  • 🟡 bugs (L234): If disconnect() is called while the reconnect POST is in-flight, .finally() will still fire and set _connected = true, call flushPendingSends(), and emit _open — even though the connection has been intentionally torn down. disconnect() sets _connected = false and closes the EventSource, but the dangling .finally() callback will overwrite that state. Consider adding a guard (e.g., check this.es !== null or use an abort signal) inside the .finally() to bail out if the connection was closed during the POST. [fixable]
  • 🔵 unsafe_assumptions (L227): If two welcome events arrive in rapid succession (e.g., EventSource auto-reconnect races with an explicit reconnect), two overlapping doPost('reconnect', ...) calls run concurrently, each with its own .finally(). Both will set _connected = true and call flushPendingSends() — the second flush is harmless (empty queue) but the double _open emission could confuse listeners. This is an edge case but worth noting since the old code (synchronous set) was immune to it. [fixable]
  • 🔵 regressions (L227): Contrast with the WS-based connection.ts (line 221): the WS path sets _connected = true synchronously on welcome, sends reconnect, then flushes. This PR intentionally diverges — SSE defers _connected until POST completes. This is correct for SSE (where reconnect is a separate HTTP round-trip), but the behavioral difference between the two transport implementations should be noted if they're expected to be interchangeable drop-in replacements. Not a bug, but a contract divergence.

Comment thread packages/client/src/sse-connection.ts Outdated
sessionId,
lastSeq,
})),
}).finally(() => {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 bugs: If disconnect() is called while the reconnect POST is in-flight, .finally() will still fire and set _connected = true, call flushPendingSends(), and emit _open — even though the connection has been intentionally torn down. disconnect() sets _connected = false and closes the EventSource, but the dangling .finally() callback will overwrite that state. Consider adding a guard (e.g., check this.es !== null or use an abort signal) inside the .finally() to bail out if the connection was closed during the POST. [fixable]

// _connected is deferred until reconnect completes — setting it earlier
// lets external send() bypass the pending queue before the server has
// reattached the session.
if (this._isReconnect && this.seqBySession.size > 0) {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 unsafe_assumptions: If two welcome events arrive in rapid succession (e.g., EventSource auto-reconnect races with an explicit reconnect), two overlapping doPost('reconnect', ...) calls run concurrently, each with its own .finally(). Both will set _connected = true and call flushPendingSends() — the second flush is harmless (empty queue) but the double _open emission could confuse listeners. This is an edge case but worth noting since the old code (synchronous set) was immune to it. [fixable]

// _connected is deferred until reconnect completes — setting it earlier
// lets external send() bypass the pending queue before the server has
// reattached the session.
if (this._isReconnect && this.seqBySession.size > 0) {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 regressions: Contrast with the WS-based connection.ts (line 221): the WS path sets _connected = true synchronously on welcome, sends reconnect, then flushes. This PR intentionally diverges — SSE defers _connected until POST completes. This is correct for SSE (where reconnect is a separate HTTP round-trip), but the behavioral difference between the two transport implementations should be noted if they're expected to be interchangeable drop-in replacements. Not a bug, but a contract divergence.

…aces

Addresses Centaur review findings:
- Defer _connected into both branches so external send() can't bypass
  the pending queue while reconnect POST is in-flight
- Guard .finally() with es-null check (disconnect during POST) and
  connectionId staleness check (newer welcome arrived)
- Add 4 tests: deferred _connected, dangling finally after disconnect,
  stale finally on double welcome, flush ordering after reconnect

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@dimakis

dimakis commented Jun 22, 2026

Copy link
Copy Markdown
Owner Author

/review

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 3 issue(s).

packages/client/src/__tests__/sse-connection.test.ts

_Well-designed fix that correctly moves reconnect data from URL params to POST and defers connected with solid race-condition guards — the main gap is handling/documenting behavior when the reconnect POST itself fails.

  • 🔵 missing_tests: No test for reconnect POST failure. Since doPost() swallows errors and .finally() always runs, a failed reconnect POST still sets _connected=true and flushes pending sends — even though the server never ran handleReconnect (no watch, no reattach, no replay). A test should verify the behavior when the fetch rejects or returns { ok: false }, so the chosen semantics are documented and intentional. [fixable]
  • 🔵 missing_tests: No test for the EventSource auto-reconnect path (onerror fires while reconnect POST is in-flight). When onerror fires, _connected is already false so the handler is a no-op, but after auto-reconnect a new welcome arrives and triggers a second concurrent reconnect POST. The stale-welcome guard (connectionId check) should handle it, but this scenario is untested. [fixable]

packages/client/src/sse-connection.ts

_Well-designed fix that correctly moves reconnect data from URL params to POST and defers connected with solid race-condition guards — the main gap is handling/documenting behavior when the reconnect POST itself fails.

  • 🔵 unsafe_assumptions (L235): .finally() marks the connection as live regardless of whether the reconnect POST succeeded or failed (doPost swallows all errors). If the POST fails (network blip, server 500), the server hasn't re-watched or re-attached sessions, but the client will immediately flush pending sends into the void. Consider using .then() and only setting _connected on success, or at minimum adding a test that documents this as intentional fire-and-forget behavior. [fixable]

Comment thread packages/client/src/sse-connection.ts Outdated
sessionId,
lastSeq,
})),
}).finally(() => {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 unsafe_assumptions: .finally() marks the connection as live regardless of whether the reconnect POST succeeded or failed (doPost swallows all errors). If the POST fails (network blip, server 500), the server hasn't re-watched or re-attached sessions, but the client will immediately flush pending sends into the void. Consider using .then() and only setting _connected on success, or at minimum adding a test that documents this as intentional fire-and-forget behavior. [fixable]

Replace .finally() with dedicated doReconnectPost() that checks res.ok —
on failure, the client stays disconnected and lets EventSource auto-
reconnect retry. Prevents flushing pending sends when the server never
ran handleReconnect.

Add 3 tests: POST returns 500, POST throws network error, recovery on
next auto-reconnect welcome after a failed attempt.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 4 issue(s).

packages/client/src/sse-connection.ts

Well-designed fix that correctly moves reconnect from URL query params to POST, with thorough race-condition guards (disconnect, stale welcome, failure) and strong test coverage. No bugs found — suggestions are minor style/test improvements.

  • 🔵 style (L227): Redundant cast: welcomeConnectionId is assigned msg.connectionId as string, but this._connectionId was already set to the identical value on line 215. Consider const welcomeConnectionId = this._connectionId; to make it explicit this is a snapshot of the instance field for the staleness guard, not a re-parse of the message. [fixable]
  • 🔵 regressions: Server-side dead code: server/index.ts:391-405 still parses ?sessions= from the EventSource URL query params and calls handleReconnect. Since the client no longer sends this param, that code path is unreachable from SSE clients. Not a bug (harmless), but worth a follow-up cleanup to remove the dead path and avoid confusion about which reconnect path is canonical. [fixable]

packages/client/src/__tests__/sse-connection.test.ts

Well-designed fix that correctly moves reconnect from URL query params to POST, with thorough race-condition guards (disconnect, stale welcome, failure) and strong test coverage. No bugs found — suggestions are minor style/test improvements.

  • 🔵 missing_tests: No test verifies that SSE onmessage events are still dispatched to the listener while the reconnect POST is in-flight (_connected = false). The onmessage handler is independent of _connected, so this works, but an explicit test would document this important invariant — replay events from handleReconnect arrive via SSE before the POST response. [fixable]
  • 🔵 missing_tests: The 'stays disconnected when reconnect POST fails' test queues a send ('should stay queued') but never verifies the queued message survives the failure and is flushed on a successful retry. The 'recovers after failed reconnect' test proves reconnection works but doesn't check pending sends. Adding an assertion that the queued send fires after recovery would close the loop. [fixable]

Comment thread packages/client/src/sse-connection.ts Outdated
// _connected is deferred until reconnect completes — setting it earlier
// lets external send() bypass the pending queue before the server has
// reattached the session.
const welcomeConnectionId = msg.connectionId as string;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 style: Redundant cast: welcomeConnectionId is assigned msg.connectionId as string, but this._connectionId was already set to the identical value on line 215. Consider const welcomeConnectionId = this._connectionId; to make it explicit this is a snapshot of the instance field for the staleness guard, not a re-parse of the message. [fixable]

Address Centaur review round 3:
- Remove redundant cast; use this._connectionId snapshot directly
- Remove dead server-side ?sessions= query param handler (unreachable
  since client now uses POST for all reconnects) and unused import
- Add test: SSE events dispatch to listener while reconnect POST is
  in-flight (_connected=false doesn't gate onmessage)
- Add test: queued sends survive POST failure and flush on successful
  retry via next auto-reconnect welcome

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@dimakis

dimakis commented Jun 22, 2026

Copy link
Copy Markdown
Owner Author

Centaur Review

Found 3 issue(s).

packages/client/src/sse-connection.ts

Well-implemented migration from query-param to POST-based reconnect with thorough race-condition guards and comprehensive test coverage; only minor suggestions around dead code and passive-only retry on failure.

  • 🔵 unsafe_assumptions (L300): When the reconnect POST returns !res.ok, the client stays disconnected but the EventSource connection remains open and healthy. EventSource only auto-reconnects on transport errors — if the SSE stream stays alive (e.g. stable localhost), there's no automatic trigger for a retry. Recovery depends on a visibility change (checkAndReconnect) or an eventual ES transport error. Consider scheduling a delayed retry or forced reconnect on !res.ok so the client doesn't sit in limbo on stable desktop connections. [fixable]
  • 🔵 style (L365): The 'reconnect''reconnect' mapping in messageTypeToEndpoint is now dead code — reconnect is handled internally by doReconnectPost, never via send(). Keeping it means an external send({ type: 'reconnect', ... }) call would bypass the staleness guards in doReconnectPost. Consider removing it to prevent misuse. [fixable]
  • 🔵 style (L307): The catch {} block silently swallows all errors. While this matches doPost's pattern, reconnect failures are higher stakes (entire session recovery depends on it). A console.warn would make debugging production reconnect issues significantly easier without changing behavior. [fixable]

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 3 issue(s).

packages/client/src/__tests__/sse-connection.test.ts

Solid fix for a real race condition — the staleness guards are well-designed and test coverage of the new async reconnect flow is thorough across normal, failure, and concurrent scenarios.

  • 🔵 missing_tests: No test covers the path where _isReconnect=true but seqBySession is empty (e.g., all sessions cleared before reconnect). The code correctly falls through to the immediate-connect branch, but this behavior is untested and could regress. [fixable]
  • 🔵 style: Two test names reference .finally() ('discards dangling .finally() if disconnect()…', 'discards stale .finally() when a newer welcome arrives') but the implementation uses an if guard after await, not .finally(). The names appear vestigial from an earlier design iteration and are mildly misleading. [fixable]

packages/client/src/sse-connection.ts

Solid fix for a real race condition — the staleness guards are well-designed and test coverage of the new async reconnect flow is thorough across normal, failure, and concurrent scenarios.

  • 🔵 unsafe_assumptions (L307): The catch {} in doReconnectPost silently swallows all errors with no logging or diagnostic output. While the recovery path (stay disconnected, let EventSource auto-reconnect) is correct, completely silent failure makes debugging production reconnect issues harder. Consider emitting a diagnostic event or at minimum logging to console.warn. [fixable]

Comment thread packages/client/src/sse-connection.ts Outdated
}
// On !res.ok: stay disconnected. EventSource auto-reconnect will
// trigger a new welcome and retry the reconnect POST.
} catch {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 unsafe_assumptions: The catch {} in doReconnectPost silently swallows all errors with no logging or diagnostic output. While the recovery path (stay disconnected, let EventSource auto-reconnect) is correct, completely silent failure makes debugging production reconnect issues harder. Consider emitting a diagnostic event or at minimum logging to console.warn. [fixable]

- Add console.warn to doReconnectPost catch block for diagnostics
- Rename test names from stale .finally() references to match
  current await-based implementation
- Add test: reconnect with empty seqBySession skips POST and
  connects immediately

Co-Authored-By: Claude Opus 4.6 <[email protected]>

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 3 issue(s) (1 warning).

packages/client/src/sse-connection.ts

Solid fix for a real reliability bug — the POST-on-welcome approach with staleness guards is well-designed and thoroughly tested. One edge case: if the reconnect POST fails but the SSE stream stays healthy, the client stays disconnected with no automatic retry (recovery requires a browser lifecycle event).

  • 🟡 bugs (L298): On !res.ok (or network error), the client stays _connected=false and the comment says "EventSource auto-reconnect will trigger a new welcome and retry." But EventSource only auto-reconnects on connection errors — if the SSE stream stays healthy while the reconnect POST returns 500, no auto-reconnect fires and the client is stuck in a disconnected state. Recovery still happens via browser lifecycle (visibilitychange triggers checkAndReconnect(false), which proceeds when _connected=false), but it's user-action-dependent, not automatic. Consider forcing a reconnect on POST failure (this.checkAndReconnect(true)) or adding a short retry timer. [fixable]
  • 🔵 style (L218): The 9-line block comment in the welcome handler is verbose for a codebase that defaults to no comments (per CLAUDE.md). The method name doReconnectPost and the staleness guard are self-documenting. Consider trimming to 2-3 lines covering only the non-obvious insight: why _connected is deferred and why welcomeConnectionId is snapshotted. [fixable]

packages/client/src/__tests__/sse-connection.test.ts

Solid fix for a real reliability bug — the POST-on-welcome approach with staleness guards is well-designed and thoroughly tested. One edge case: if the reconnect POST fails but the SSE stream stays healthy, the client stays disconnected with no automatic retry (recovery requires a browser lifecycle event).

  • 🔵 missing_tests (L530): The 'recovers after failed reconnect when EventSource auto-reconnects' test simulates recovery by manually emitting a new welcome on the same EventSource — which models the EventSource-error-and-reconnect path. There's no test for the more likely recovery path: POST fails, SSE stays up, then checkAndReconnect(false) is called (e.g. via visibilitychange) while _connected=false. This is the actual recovery mechanism when the SSE stream doesn't drop. [fixable]

Comment thread packages/client/src/sse-connection.ts Outdated

// Guard: bail if disconnect() was called or a newer welcome
// arrived while the POST was in-flight.
if (!this.es || this._connectionId !== welcomeConnectionId) return;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 bugs: On !res.ok (or network error), the client stays _connected=false and the comment says "EventSource auto-reconnect will trigger a new welcome and retry." But EventSource only auto-reconnects on connection errors — if the SSE stream stays healthy while the reconnect POST returns 500, no auto-reconnect fires and the client is stuck in a disconnected state. Recovery still happens via browser lifecycle (visibilitychange triggers checkAndReconnect(false), which proceeds when _connected=false), but it's user-action-dependent, not automatic. Consider forcing a reconnect on POST failure (this.checkAndReconnect(true)) or adding a short retry timer. [fixable]

Comment thread packages/client/src/sse-connection.ts Outdated
this.flushPendingSends();
this.listener?.({ type: '_open' });
// Always send reconnect on welcome if we have tracked sessions.
// Native EventSource auto-reconnect reuses the original URL (without

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 style: The 9-line block comment in the welcome handler is verbose for a codebase that defaults to no comments (per CLAUDE.md). The method name doReconnectPost and the staleness guard are self-documenting. Consider trimming to 2-3 lines covering only the non-obvious insight: why _connected is deferred and why welcomeConnectionId is snapshotted. [fixable]

expect(listener).not.toHaveBeenCalledWith({ type: '_open' });
});

it('recovers after failed reconnect when EventSource auto-reconnects', async () => {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 missing_tests: The 'recovers after failed reconnect when EventSource auto-reconnects' test simulates recovery by manually emitting a new welcome on the same EventSource — which models the EventSource-error-and-reconnect path. There's no test for the more likely recovery path: POST fails, SSE stays up, then checkAndReconnect(false) is called (e.g. via visibilitychange) while _connected=false. This is the actual recovery mechanism when the SSE stream doesn't drop. [fixable]

Address Centaur round 5:
- Force checkAndReconnect(true) on POST failure/network error so the
  client doesn't stay stuck when the SSE stream remains healthy
- Trim verbose welcome handler comment to 2 lines
- Add tests: forced reconnect on POST failure, multi-retry recovery

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@dimakis

dimakis commented Jun 22, 2026

Copy link
Copy Markdown
Owner Author

Centaur Review

Found 2 issue(s) (1 warning).

packages/client/src/sse-connection.ts

_Well-designed fix with excellent test coverage. The migration from query-param reconnect to POST-based reconnect correctly solves the EventSource URL-reuse bug. Two minor concerns: no backoff on persistent POST failures (could create a tight retry loop), and spurious close events when the connection was never opened.

  • 🟡 unsafe_assumptions (L296): No backoff on reconnect POST failure retry. When doReconnectPost fails (HTTP error or network error), it immediately calls checkAndReconnect(true), which tears down the EventSource and creates a new one. If /api/chat/reconnect is persistently returning 500, this produces a tight loop: connect → welcome → POST → 500 → close → reconnect → welcome → POST → 500 → ... The reconnectDelayMs config exists but is never applied here. Natural EventSource connection latency provides some rate limiting, but there is no exponential backoff or max-retry cap. [fixable]
  • 🔵 style (L184): checkAndReconnect(true) unconditionally emits { type: '_close' } to the listener even when _connected was never true. On the doReconnectPost failure path (lines 296, 301), this produces a spurious _close for a connection that was never opened. The listener likely tolerates this, but gating with if (this._connected) (mirroring the guard in es.onerror at line 251) would be more precise. [fixable]

Address Centaur round 6:
- Replace immediate checkAndReconnect(true) with scheduleReconnect()
  that uses reconnectDelayMs to avoid tight retry loops on persistent
  server failures
- Gate _close emission on wasConnected to prevent spurious events when
  the connection was never opened

Co-Authored-By: Claude Opus 4.6 <[email protected]>

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 4 issue(s) (1 warning).

packages/client/src/sse-connection.ts

Sound architectural shift from query-param to POST-based reconnect with excellent test coverage. One race condition remains: a stale doReconnectPost can set _connected = true if checkAndReconnect fires mid-flight, fixable with a defensive _connected = false in the welcome handler's reconnect branch.

  • 🟡 bugs (L224): Race condition: if checkAndReconnect() fires while doReconnectPost(A) is in-flight, it closes ES1 and creates ES2 via doConnect(). When the stale POST resolves, the guard this._connectionId !== welcomeConnectionId passes because _connectionId is still 'A' (ES2's welcome hasn't arrived yet), and !this.es is false (ES2 exists). This sets _connected = true prematurely. When ES2's welcome arrives, the reconnect branch does NOT reset _connected to false, so send() bypasses the queue during doReconnectPost(B) — the exact race this PR aims to prevent. Fix: add this._connected = false; before this.doReconnectPost(welcomeConnectionId) as a defensive reset. [fixable]

packages/client/src/__tests__/sse-connection.test.ts

Sound architectural shift from query-param to POST-based reconnect with excellent test coverage. One race condition remains: a stale doReconnectPost can set _connected = true if checkAndReconnect fires mid-flight, fixable with a defensive _connected = false in the welcome handler's reconnect branch.

  • 🔵 missing_tests: No test for checkAndReconnect called while doReconnectPost is in-flight (the race condition above). A test scenario: start doReconnectPost with a pending promise, call checkAndReconnect(true) before resolving it, then verify the stale POST's resolution does NOT set _connected = true. [fixable]
  • 🔵 missing_tests: No test for the wasConnected guard in checkAndReconnect — i.e. verifying that _close is NOT emitted when checkAndReconnect is called while already disconnected (e.g. double-reconnect without intermediate connection). [fixable]
  • 🔵 style (L498): Test name says 'recovers via checkAndReconnect(false)' but the test body calls conn.checkAndReconnect(true). The recovery actually happens via scheduleReconnectdoConnect(), not checkAndReconnect(false). Consider renaming to 'recovers via scheduleReconnect after repeated POST failures'. [fixable]

// external send() from bypassing the pending queue mid-reconnect.
// welcomeConnectionId is a staleness guard for concurrent welcomes.
const welcomeConnectionId = this._connectionId;
if (this._isReconnect && this.seqBySession.size > 0) {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 bugs: Race condition: if checkAndReconnect() fires while doReconnectPost(A) is in-flight, it closes ES1 and creates ES2 via doConnect(). When the stale POST resolves, the guard this._connectionId !== welcomeConnectionId passes because _connectionId is still 'A' (ES2's welcome hasn't arrived yet), and !this.es is false (ES2 exists). This sets _connected = true prematurely. When ES2's welcome arrives, the reconnect branch does NOT reset _connected to false, so send() bypasses the queue during doReconnectPost(B) — the exact race this PR aims to prevent. Fix: add this._connected = false; before this.doReconnectPost(welcomeConnectionId) as a defensive reset. [fixable]

listener.mockClear();

// New welcome — reconnect POST will fail
lastES()._emit('welcome', { type: 'welcome', protocolVersion: 2, connectionId: 'conn-def' });

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 style: Test name says 'recovers via checkAndReconnect(false)' but the test body calls conn.checkAndReconnect(true). The recovery actually happens via scheduleReconnectdoConnect(), not checkAndReconnect(false). Consider renaming to 'recovers via scheduleReconnect after repeated POST failures'. [fixable]

…light

Address Centaur round 7:
- Capture ES instance in doReconnectPost closure and check this.es
  identity — prevents stale POST from setting _connected when
  checkAndReconnect replaced the EventSource during the await
- Add test: stale POST resolves after checkAndReconnect, must not
  set _connected
- Add test: _close not emitted when already disconnected
- Rename misleading test name to match scheduleReconnect behavior

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant