fix(chat): bound CLI runner with idle watchdog + clear stuck chat spinner (#271 a+b)#276
Open
ProfSynapse wants to merge 2 commits into
Open
fix(chat): bound CLI runner with idle watchdog + clear stuck chat spinner (#271 a+b)#276ProfSynapse wants to merge 2 commits into
ProfSynapse wants to merge 2 commits into
Conversation
…nner Two provider-agnostic fixes for the "chat spins forever" report (issue #271). Both root causes were verified in current code, not just gemini/AGY. (a) cliProcessRunner had no timeout — a CLI that emits progress but never exits hung the awaiter forever. Add an INACTIVITY (idle) watchdog to the shared runner: the timer resets on every stdout/stderr chunk, so a streaming process is never cut off, but a wedged/silent one is killed and the promise RESOLVES with errorCode PROVIDER_TIMEOUT (resolve-only contract preserved, so auth-status callers don't break). Default 120s of silence, configurable; 0 disables. GoogleGeminiCliAdapter.mapCliProcessFailure maps the code to a user-visible LLMProviderError. (b) The assistant placeholder inits isLoading:true and it was only cleared on the first streamed token, so any terminal path that produced no token left the spinner stuck. Clear isLoading on all three stuck paths: the empty-complete final branch and the post-loop safety net in MessageStreamHandler, and the non-abort error path via a new AbortHandler.finalizeErroredPlaceholder() called from both MessageManager catch blocks. The genuine-abort path is untouched. Scope note: AnthropicClaudeCodeAdapter's streaming generation spawns directly and bypasses the shared runner, so its never-closing case is caught by neither fix — tracked as a follow-up. The shared-runner timeout still guards gemini generation and both CLI auth-status checks; the chat-state clear cures the common terminal-but-no-first-token case for all providers. Tests: cliProcessRunner idle-watchdog (fire/reset/disable/default), AbortHandler finalizeErroredPlaceholder, MessageStreamHandler empty-complete + safety-net isLoading clear, gemini PROVIDER_TIMEOUT mapping. Full suite green except the pre-existing TaskBoardEditCoordinator jsdom-Modal failure (unrelated). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]> Claude-Session: https://claude.ai/code/session_01NxKeRz1gihguL9wcidm78m
Peer-review of 292bc51 flagged that finalizeErroredPlaceholder was only exercised in isolation (AbortHandler.test.ts) — the WIRING at the two new MessageManager catch sites was untested, so the mock hid the seam. Adds three MessageManager-level tests driving a real non-abort error through the generation path and asserting the placeholder END STATE via the actual catch-branch wiring (not by calling finalizeErroredPlaceholder directly): - send path, error before first token: placeholder ends isLoading:false, state:'invalid'. - send path, error mid-stream: partial content preserved, isLoading:false (first-token path already cleared the spinner, so finalize is a no-op). - regenerate path (handleRetryMessage -> regenerateAIResponse -> generateFreshAIResponse): covers the second new call site. Test-only; no production-code change. Full suite 3713 pass / 21 skip; the lone TaskBoardEditCoordinator jsdom-Modal failure is the known pre-existing one. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]> Claude-Session: https://claude.ai/code/session_01NxKeRz1gihguL9wcidm78m
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two provider-agnostic chat-state bugs surfaced via #271 (the AGY work), but verified to affect every provider — not just gemini/AGY. This PR is the chat-state half of #271 only; the
gemini→Antigravity (agy) runtime re-route is a separate follow-up.Re: #271 (partial — chat-loading-state fixes a + b). Issue intentionally left open for the AGY re-route + manual verification.
Fixes
(a) Unbounded CLI process → idle watchdog
runCliProcess(src/utils/cliProcessRunner.ts) resolved only on childclose/error— a CLI that streamed progress but never exited hung the awaiter forever.0disables).errorCode: 'PROVIDER_TIMEOUT'— the resolve-only contract is preserved (no caller is converted to handle a rejection). TheLLMProviderError(PROVIDER_TIMEOUT)is constructed in the gemini adapter'smapCliProcessFailure, not the runner. No--print-timeoutarg added.(b) Stuck chat spinner on empty/error-before-first-token
The assistant placeholder inits
isLoading: trueand was only cleared inside the first-token branch — so empty-complete, the post-loop safety-net, and non-abort errors left the spinner frozen forever.isLoading: falseis now set on the empty-complete branch + post-loop safety-net (MessageStreamHandler).AbortHandler.finalizeErroredPlaceholder(setsstate: 'invalid', clearsisLoading, preserves partial content) is called from bothMessageManagernon-abort catch branches.setLoadinginput spinner.Scope note — please read
The auth-status checks now also inherit the 120s idle watchdog (previously unbounded); both callers degrade gracefully on timeout.
Known residual (tracked as a follow-up issue)
AnthropicClaudeCodeAdapter's streaming generation spawns directly (bypassesrunCliProcess), so a wedged-never-closing Claude Code process is guarded by neither fix. The chat-state clear still cures the common terminal-but-no-first-token case for all providers. Filing a separate issue to bring CC's own spawn under an idle timeout.Tests
New: idle-watchdog fires →
PROVIDER_TIMEOUT, watchdog resets on output (no mid-stream cut),isLoadingcleared on each terminal-without-first-token path, and 3 added in review covering the MessageManager → finalizeErroredPlaceholder wiring through the real catch branches (both call sites).Verification
npm run buildclean ·npm run lintcleanTaskBoardEditCoordinatorjsdom-Modal issue (untouched files; not a regression).Review
Peer-reviewed (APPROVE, 0 blocking):
docs/review/pr271-chat-loading-state-2026-06-22.md,docs/review/pr271-fe-chat-state-lifecycle-2026-06-22.md,docs/review/test-coverage-271-272-2026-06-22.md.🤖 Generated with Claude Code