Conversation
An agent child could die and be transparently respawned with zero trace — the exact failure mode behind the v0.6.0 "push events stop while RPC keeps working" report took two days to localize because nothing recorded when or why a channel replaced its process. Every transition in reconnecting-process-channel now leaves a main.log line under source "agent-channel": spawn (phase, pid), ready (epoch), child close (exit code, signal, wasReady, stderr tail), handshake/pipe/spawn failures, respawn scheduling with backoff delay, epoch-mismatch daemon replacement, fatal-failure escalation, and dispose. Channels are labeled local:<root> / ssh:<host> via a new logLabel option since every workspace owns one. Both the local and SSH channels route through this file, so one logging site covers both transports. The logger is created lazily (same pattern as pipe.ts) so test imports do not initialize electron-log. Co-Authored-By: Claude Opus 4.8 <[email protected]>
…0 watch/PTY/LSP regression 56e411a added the degraded / degraded-recovered / ready lifecycle events, but two consumers written before that commit still assumed "anything except reconnecting/disposed = the channel died": - handleLocalChannelLifecycle tore down the local workspace provider on a single late heartbeat: it dropped the channel reference WITHOUT disposing it (orphan agent process keeping every fs/git watch) and the next fs access lazily booted a fresh watch-less agent. Result: fs.changed / git.changed went permanently silent while RPC kept working — git status only refreshed via the 1-minute autofetch tick — and existing PTY sessions were stranded on the abandoned channel. - The LSP host disposed every language server on the channel for the same spurious event. The trigger was chronic: heartbeats are sent every 5s and judged at 5s, so arrivals land at interval+epsilon and the 1-miss degraded check rides the boundary — confirmed live (channel B ready 22:15:08.7, degraded teardown + duplicate agent spawn at +5.1s, matching the two orphan agent processes observed on the same workspace). Both handlers now act only on genuine terminal events. The SSH manager handler and PTY agent-host already handled all eight event types explicitly (audited) and are unchanged. Regression tests: degraded/degraded-recovered/ready must not tear down the local provider or LSP server records; exit/failure (and held-then-expired for LSP) still must. All eight fail on the pre-fix code. Co-Authored-By: Claude Opus 4.8 <[email protected]>
Watch registrations (fsnotify) live in the agent process, so a respawned agent starts with zero watches — and nothing re-issued them. Root and .git watches were registered exactly once per workspace open (ensureRoot / repo-info detection), so any agent replacement silently killed push events until the workspace was reopened. Observed live: an agent with 62 expanded-dir watches (re-added by user navigation) but no root and no .git watch — the two registrations with no natural re-issue point. Three pieces: - The channel now emits the `ready` lifecycle event on a successful no-epoch reconnect too (local agents / legacy remotes). Previously only the epoch-match reattach path emitted it, so a local reconnect completed silently. Existing consumers are safe by audit: the SSH manager broadcasts "connected", PTY restore is a no-op with no held sessions, and the local manager / LSP host ignore `ready` since the previous commit. - AgentBackedProvider gains onAgentLifecycle (channel.onLifecycle passthrough on AgentFsProvider; the type guard now requires it). - AgentFsWatcher replays every tracked relPath and AgentGitWatcher replays its gitDir on `ready`. Re-registering an existing watch is a no-op agent-side, so the replay is safe when the agent actually survived. Co-Authored-By: Claude Opus 4.8 <[email protected]>
Heartbeats were sent every 5s and judged at 5s (degraded = 1 missed interval), so arrivals landed at interval+epsilon and the client's degraded check chronically rode the boundary — the heartbeat histogram showed nearly all arrivals in the [1-2x] bucket, and a phase-aligned check fired spurious degraded lifecycle events during normal operation (the trigger for the v0.6.0 local-channel teardown regression). The send cadence and the advertised judgment basis are now two constants in proto.go shared by stdio and daemon modes: HeartbeatSendMs (4s) < HeartbeatAdvertiseMs (5s). The ~1s wire-jitter margin removes the boundary condition while keeping real-outage detection latency at 5s. The TS client is unchanged — it already derives both thresholds from the advertised value. Verified against the built binary: ready frame advertises 5000ms, actual inter-arrival measured at 3999-4000ms. Co-Authored-By: Claude Opus 4.8 <[email protected]>
The AgentBackedProvider guard now requires onAgentLifecycle (watch replay, 0ff4f7c); the integration fixture predated it and failed the guard. Unit fixtures were updated in that commit — this integration one was missed because only tests/unit ran locally. Co-Authored-By: Claude Opus 4.8 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Release v0.6.1
0.6.0에서 도입된 lifecycle 이벤트(
degraded등)를 구버전 핸들러들이 "채널 사망"으로 오인해 발생한 회귀 일괄 수정.Fixed
degraded)에 로컬 채널 provider가 teardown되어 —fs.changed/git.changedpush 영구 침묵(깃 상태·파일트리 자동 갱신 정지, autofetch 1분 주기만 동작), 기존 터미널 먹통, LSP 서버 전멸, 고아 에이전트 프로세스 잔류. teardown을 진짜 종료 이벤트(exit/failure)로 한정.proto.go단일 상수). 만성적 거짓degraded발화 원인 제거.Added (내부)
agent-channelsource, main.log) — spawn/ready/close/respawn을 PID·exit code·stderr tail과 함께 기록.Protocol & Remote 영향
테스트: 신규 회귀 테스트 22개 (수정 전 코드에서 실패 확인), Go 전체 + TS 유닛 2,961개 통과.
🤖 Generated with Claude Code