fix(hooks): stop hook field rename + debug tracing system#228
Conversation
…nges: 1. Fix stop hook field rename (response_text -> last_assistant_message) that broke working memory capture across all projects. Also removes the stop_reason filter (no longer needed -- last_assistant_message is only populated on real assistant turns). Updates all tests to use the new field name. 2. Add consistent DEVFLOW_HOOK_DEBUG=1 debug tracing to all 7 hooks via a new shared scripts/hooks/debug-trace helper. Each hook now has a safe dbg() no-op fallback before set -e so missing debug-trace never causes hook failure. Phase 1 traces to ~/.devflow/logs/.hook-debug.log (pre-CWD global fallback), Phase 2 switches to the per-project log. Adds normal file logging to sidecar-dispatch, session-start-memory, session-start-context, and pre-compact-memory. New devflow debug --enable/--disable/--status CLI command toggles DEVFLOW_HOOK_DEBUG in ~/.claude/settings.json env block. Co-Authored-By: Claude <[email protected]>
…LOW_HOOK_DEBUG!=1 to avoid unnecessary mkdir syscalls on every hook invocation - sidecar-capture: guard jq INPUT keys subshell behind debug check so it only runs when tracing is active; remove duplicate dbg log - sidecar-dispatch: remove duplicate CWD debug log (already logged with PROMPT_LENGTH on earlier line) - shell-hooks.test.ts: add debug-trace to HOOK_SCRIPTS syntax check list
scripts/hooks/sidecar-capture (Line 61)The Suggested fix: log() { echo "[\2026-05-27T14:37:14Z] [sidecar-capture] \$1" >> "\$LOG_FILE" 2>/dev/null || true; }Review finding with 82% confidence (MEDIUM, reliability + security) |
scripts/hooks/sidecar-capture (Line 8)The debug initialization preamble is duplicated across all 7 hooks. Each includes nearly identical:
This pattern appears in sidecar-capture:8-18, sidecar-dispatch:8-22, sidecar-evaluate:7-21, session-start-memory:7-17, session-start-context:11-22, pre-compact-memory:10-19, and preamble:7-17. Consider extracting into a shared helper (e.g., Review finding with 85% confidence (HIGH, complexity) |
scripts/hooks/debug-trace (Line 29)Debug log directories are created with Suggested fix: mkdir -p "\$global_log_dir" 2>/dev/null || true
chmod 700 "\$global_log_dir" 2>/dev/null || trueAlso add the same fix at line 42-43 for the Phase 2 per-project log directory. Review finding with 85% confidence (MEDIUM, security) |
scripts/hooks/sidecar-dispatch (Line 149)Unguarded command substitution
These should match the pattern in sidecar-capture:33, which guards with Fix example: if [ "\${DEVFLOW_HOOK_DEBUG:-}" = "1" ]; then
dbg "Learned artifacts: commands=$(echo "\$LEARNED_COMMANDS" | grep -c , 2>/dev/null || echo 0)"
fiReview finding with 92% confidence (MEDIUM, performance) |
scripts/hooks/preamble (Line 44)Missing if [[ "\$PROMPT" == *"## Goal"* ]] && [[ "\$PROMPT" == *"## Steps"* ]] && [[ "\$PROMPT" == *"## Files"* ]]; then
dbg "EXECUTION_PLAN detected — injecting directive"
json_prompt_output "EXECUTION_PLAN detected. Invoke \`devflow:implement\` via the Skill tool to execute this plan."
else
dbg "No plan markers detected — no output"
fi
dbg "=== HOOK COMPLETE ==="Review finding with 90% confidence (MEDIUM, consistency) |
scripts/hooks/sidecar-capture (Line 59)Inconsistent error handling for Pre-existing hooks (sidecar-capture:59, sidecar-evaluate:56): hard-fail source "\$SCRIPT_DIR/log-paths" || { echo "..."; exit 1; }Newly-added logging (pre-compact-memory:41, session-start-memory:38, session-start-context:47, sidecar-dispatch:45): soft-fail source "\$SCRIPT_DIR/log-paths" || trueThe soft-fail Review finding with 85% confidence (HIGH, consistency) |
scripts/hooks/sidecar-evaluate (Line 496)File size now 496 lines (was 465 on main), approaching the 500-line critical threshold. The reinforcement section (lines 167-241) has nesting up to 7 levels with 15+ lines of inline jq and equivalent node fallback length. This is a pre-existing structural issue, but the new debug instrumentation (31 new Consider a follow-up refactor: Extract the 4 major evaluation sections into sourced helpers:
This reduces the main script to a dispatcher. Applies PF-004 thinking — avoids migration complexity by preserving behavior while improving file organization. Review finding with 90% confidence (MEDIUM, complexity) |
src/cli/commands/debug.ts (Line 27)
Suggested fix — differentiate error types: try {
const raw = await fs.readFile(settingsPath, 'utf-8');
settings = JSON.parse(raw) as Record<string, unknown>;
} catch (err: unknown) {
if (err instanceof SyntaxError) {
p.log.error('settings.json is malformed — fix it before modifying env vars');
return;
}
// ENOENT or other fs error — safe to start fresh
settings = {};
}Review finding with 85% confidence (HIGH, TypeScript) |
src/cli/commands/debug.ts (Line 32)Unsafe type assertion: Add runtime type guard: const rawEnv = settings.env;
const env: Record<string, string> =
(typeof rawEnv === 'object' && rawEnv !== null && !Array.isArray(rawEnv))
? (rawEnv as Record<string, string>)
: {};Note: Applies ADR-001 (clean break philosophy) — no backward compatibility needed, just do it right from the start. Review finding with 82% confidence (MEDIUM, TypeScript) |
Review SummaryOverall: APPROVED_WITH_CONDITIONS (field rename is solid, debug system well-designed, actionable improvements needed) High-Confidence Findings (≥80%) — 9 Comments PostedAll critical issues are addressed in individual comments above. Priority fixes:
Lower-Confidence Suggestions (60-79%)These go in summary (not as inline comments):
Test Coverage Gaps (HIGH priority for follow-up)Per the Testing reviewer:
Field Rename Validation✅ Complete and correct:
Design Notes
Next Steps
Generated by devflow:git comment-pr operation |
- Catch SyntaxError from JSON.parse separately to prevent silently
overwriting a malformed settings.json with {}
- Replace unsafe cast of settings.env with a runtime type guard
that falls back to {} for non-object values
Co-Authored-By: Claude <[email protected]>
…- debug-trace: add chmod 700 after mkdir in both Phase 1 and Phase 2 to match log-paths pattern and restrict debug log directory permissions - debug-trace: add 5MB size guard to prevent unbounded log growth; truncates to tail 2.5MB using PID-unique temp file for atomicity - sidecar-capture: add 2>/dev/null || true to log() to prevent set -e abort on log write failures - sidecar-capture/sidecar-evaluate: change log-paths sourcing from hard-fail to soft-fail matching the 4 new hooks; use two-step LOG_DIR/LOG_FILE pattern with /tmp fallback for consistency and defensiveness - sidecar-evaluate: remove now-redundant mkdir -p on LOG_FILE dirname - session-start-context, sidecar-dispatch, sidecar-evaluate: guard unguarded subshell calls in dbg arguments behind DEVFLOW_HOOK_DEBUG=1 check to eliminate unnecessary subprocess forks when debug is OFF - preamble: add missing dbg HOOK COMPLETE bookend for consistency
…n regression - tests/shell-hooks.test.ts: add 4 debug-trace behavioral tests verifying that dbg() is a no-op when DEVFLOW_HOOK_DEBUG is unset, writes to global log when enabled, and devflow_debug_set_cwd switches to per-project log - tests/sentinel.test.ts: add stop_reason=tool_use regression test confirming capture proceeds when last_assistant_message is present regardless of stop_reason value; add log file creation test for sidecar-capture - tests/debug.test.ts: new test file covering devflow debug --enable/--disable/ --status/malformed settings.json — the only CLI command previously without tests Co-Authored-By: Claude <[email protected]>
- debug-trace: harden tail+mv sequence with explicit conditional to ensure temp file cleanup on tail failure - sidecar-dispatch: remove duplicate PROMPT_LENGTH debug log (already logged on prior line) - debug.test.ts: remove unused import (path module) Co-Authored-By: Claude <[email protected]>
Review Summary: CHANGES_REQUESTEDComprehensive review across 9 dimensions identified 4 blocking architectural/reliability issues and 2 consistency violations requiring fixes before merge. All test coverage, regression, and security hygiene are solid. Field rename migration is complete with zero stale references. Blocking Issues1. Missing 5MB Size Guard on Phase 2 Debug Log (Reliability + Security)File: The global debug log (Phase 1) has a 5MB truncation guard, but the per-project log (Phase 2) does not. This reintroduces the unbounded growth risk on the primary logging path (Phase 2 fires on 80% of hook invocations). Every 100 prompt cycles with debug enabled will create ~100-150MB of debug logs with no cleanup. Fix: Copy the exact 5MB truncation pattern into 2. Feedback-Loop Guard Inconsistency (Consistency)File:
Fix: Move feedback-loop guards after 3.
|
| Category | Count |
|---|---|
| Blocking (must fix) | 4 |
| Should-Fix (recommended) | 3 |
| False Positives Cleared | 6 |
| Regression Tests Passing | All |
Recommendation: Fix the 4 blocking issues (5MB guard, feedback-loop guards, CWD validation order, pure functions pattern), then re-review debug.ts TypeScript surface for the type assertions.
dean0x
left a comment
There was a problem hiding this comment.
Missing 5MB Size Guard on Phase 2 Debug Log
Line 46-59: devflow_debug_set_cwd creates the per-project debug log without any size guard.
The Phase 1 global log (lines 32-40) has a 5MB truncation guard that keeps the tail 2.5MB. Phase 2 (here) switches to per-project logging but omits this guard entirely. Since most debug output occurs post-CWD (Phase 2), the per-project logs will accumulate the bulk of debug traffic and grow without bound over extended debug sessions.
Fix: Add the same 5MB guard pattern after setting _DEVFLOW_DBG_LOG at line 55:
```bash
_DEVFLOW_DBG_LOG="$project_log_dir/.hook-debug.log"
Size guard: same as Phase 1
if [ -f "$_DEVFLOW_DBG_LOG" ] && [ "$(wc -c < "$_DEVFLOW_DBG_LOG" 2>/dev/null || echo 0)" -gt 5242880 ]; then
local _TAIL_TMP="$_DEVFLOW_DBG_LOG.tmp.$$"
if tail -c 2621440 "$_DEVFLOW_DBG_LOG" > "$_TAIL_TMP" 2>/dev/null; then
mv "$_TAIL_TMP" "$_DEVFLOW_DBG_LOG" 2>/dev/null || rm -f "$_TAIL_TMP" 2>/dev/null || true
else
rm -f "$_TAIL_TMP" 2>/dev/null || true
fi
fi
```
Impact: Without this, 100 debug-enabled sessions will create 100-150MB of logs with no cleanup.
dean0x
left a comment
There was a problem hiding this comment.
Feedback-Loop Guard Inconsistency
sidecar-dispatch lines 14-16: Feedback-loop guards (DEVFLOW_BG_* checks) are placed before debug-trace sourcing with no debug tracing on exit.
Compare to sidecar-capture (lines 21-23), which moved these guards to after debug-trace sourcing and added dbg exit annotations:
```bash
if [ "${DEVFLOW_BG_UPDATER:-}" = "1" ]; then dbg "EXIT: bg_updater"; exit 0; fi
```
This inconsistency breaks the PR's stated goal: "all hooks now follow same debug-trace sourcing pattern."
Fix: Move lines 14-16 to after line 12 (after devflow_debug_init and dbg HOOK START), and add dbg annotations to each guard:
```bash
devflow_debug_init "sidecar-dispatch"
dbg "=== HOOK START ==="
if [ "${DEVFLOW_BG_UPDATER:-}" = "1" ]; then dbg "EXIT: bg_updater"; exit 0; fi
if [ "${DEVFLOW_BG_LEARNER:-}" = "1" ]; then dbg "EXIT: bg_learner"; exit 0; fi
if [ "${DEVFLOW_BG_KNOWLEDGE_REFRESH:-}" = "1" ]; then dbg "EXIT: bg_knowledge"; exit 0; fi
```
Same fix needed in sidecar-evaluate lines 13-15.
dean0x
left a comment
There was a problem hiding this comment.
CWD Validation Ordering
sidecar-capture line 48 vs 52: devflow_debug_set_cwd "\$CWD" is called at line 48, but CWD validation happens at line 52 (if [ -z "\$CWD" ] || [ ! -d "\$CWD" ]).
All other 6 hooks validate CWD first, then call devflow_debug_set_cwd. When CWD is empty, the current order attempts to create ~/.devflow/logs/-/ (from sed 's|^/||' on an empty string), which violates the established pattern and wastes a syscall.
Fix: Swap the order. Validation should occur before devflow_debug_set_cwd:
```bash
CWD=$(printf '%s' "$_FIELDS" | cut -d$'\001' -f1)
RESPONSE_TEXT=$(printf '%s' "$_FIELDS" | cut -d$'\001' -f2-)
if [ -z "$CWD" ] || [ ! -d "$CWD" ]; then dbg "EXIT: bad CWD"; exit 0; fi
devflow_debug_set_cwd "$CWD"
```
dean0x
left a comment
There was a problem hiding this comment.
Architectural Violation: Break Pure-Function Pattern
src/cli/commands/debug.ts lines 19-81: The command inlines the entire settings.json read-parse-mutate-write cycle instead of extracting pure functions.
Every other settings-mutating command in devflow (flags.ts, ambient.ts, hud.ts) uses this pattern:
- `applyX(settingsJson: string): string` — accepts raw JSON string, returns transformed string
- `stripX(settingsJson: string): string` — reverse transform
- Command action handler calls these pure functions and handles file I/O separately
Current problem:
- `debug.ts` couples file I/O, JSON parsing, env mutation, and writing into a single monolithic action handler
- The logic is untestable without filesystem mocks
- The test file re-implements this logic locally (`applyEnable`, `applyDisable`, `readDebugState`) rather than importing it — tests validate their copy, not production code
- The logic cannot be reused by `devflow init` or other orchestrators
Fix: Extract pure functions to a shared utility (e.g., `src/cli/utils/debug-settings.ts`):
```typescript
export function applyDebugTrace(settingsJson: string): string {
const settings = JSON.parse(settingsJson) as Record<string, unknown>;
const rawEnv = settings.env;
const env: Record<string, string> =
(typeof rawEnv === 'object' && rawEnv !== null && !Array.isArray(rawEnv))
? { ...(rawEnv as Record<string, unknown>) }
: {};
env.DEVFLOW_HOOK_DEBUG = '1';
settings.env = env;
return JSON.stringify(settings, null, 2) + '\n';
}
export function stripDebugTrace(settingsJson: string): string {
const settings = JSON.parse(settingsJson) as Record<string, unknown>;
const rawEnv = settings.env;
const env: Record<string, string> =
(typeof rawEnv === 'object' && rawEnv !== null && !Array.isArray(rawEnv))
? { ...(rawEnv as Record<string, unknown>) }
: {};
delete env.DEVFLOW_HOOK_DEBUG;
if (Object.keys(env).length === 0) delete settings.env;
else settings.env = env;
return JSON.stringify(settings, null, 2) + '\n';
}
```
Then the command action becomes:
```typescript
const raw = await fs.readFile(settingsPath, 'utf-8');
const updated = cmd === 'enable' ? applyDebugTrace(raw) : stripDebugTrace(raw);
await fs.writeFile(settingsPath, updated, 'utf-8');
```
And tests import and call the pure functions directly.
dean0x
left a comment
There was a problem hiding this comment.
Unsafe Type Assertion on Environment Values
src/cli/commands/debug.ts line 39: The code asserts `rawEnv as Record<string, string>` without validating that all values are actually strings.
Problem: If `settings.json` contains:
```json
{ "env": { "DEVFLOW_HOOK_DEBUG": 1 } }
```
(number instead of string), the `as` assertion silently succeeds and the `=== '1'` comparison at line 64 will never match, silently failing to detect the enabled state.
The `env` object in `settings.json` is a user-editable trust boundary and requires value-level validation.
Fix: Filter values to ensure only strings are included:
```typescript
const rawEnv = settings.env;
const env: Record<string, string> =
(typeof rawEnv === 'object' && rawEnv !== null && !Array.isArray(rawEnv))
? Object.fromEntries(
Object.entries(rawEnv as Record<string, unknown>)
.filter((entry): entry is [string, string] => typeof entry[1] === 'string')
)
: {};
```
This ensures non-string values (numbers, objects, etc.) are silently dropped and `env` contains only valid string key-value pairs.
dean0x
left a comment
There was a problem hiding this comment.
Mutating Input Object
src/cli/commands/debug.ts lines 39-44: The `env` variable is assigned directly from `rawEnv` without spreading, causing mutations to mutate the original parsed object.
Current code:
```typescript
const env = (typeof rawEnv === 'object' && rawEnv !== null && !Array.isArray(rawEnv))
? (rawEnv as Record<string, string>) // Direct reference, no copy
: {};
env.DEVFLOW_HOOK_DEBUG = '1'; // Mutates the original settings.env object
```
Test file does it correctly (line 37):
```typescript
? { ...(rawEnv as Record<string, string>) } // Spread creates a copy
```
While functionally correct for the current write-back flow, this violates immutability by default principle. Production code should match the test pattern.
Fix: Apply spread operator:
```typescript
const env: Record<string, string> =
(typeof rawEnv === 'object' && rawEnv !== null && !Array.isArray(rawEnv))
? { ...(rawEnv as Record<string, unknown>) }
: {};
```
…tch/evaluate Three consistency fixes across sidecar-dispatch and sidecar-evaluate (batch-4): 1. Consistent guard syntax: convert sidecar-evaluate feedback-loop guards from short-circuit form to if/then/fi matching sidecar-dispatch. Add explanatory comment to both hooks: "Feedback-loop guards — before debug-trace to minimize background session overhead". Guards intentionally stay pre-debug-trace so background agent sessions exit without sourcing debug-trace overhead. 2. Missing dbg annotations: add dbg "EXIT: no json" and dbg "EXIT: bad CWD" to the two post-debug-trace early-exit points in sidecar-dispatch, matching sidecar-capture style. 3. CWD validation: strengthen sidecar-evaluate CWD check from [ -z "$CWD" ] to [ -z "$CWD" ] || [ ! -d "$CWD" ] matching the 3 other hooks that check both conditions. sidecar-dispatch already had the dual check. Co-Authored-By: Claude <[email protected]>
… - Extract applyDebugTrace/stripDebugTrace/readDebugStatus as pure string->string functions following the applyFlags/stripFlags pattern from flags.ts - Fix unsafe rawEnv cast: filter entries with type guard (Object.fromEntries + type predicate) instead of bare as Record<string, string> - Fix mutation bug: spread rawEnv into a fresh object before modifying it - matches what the tests already did correctly - Reorder CLI option processing to status->enable->disable, consistent with learn.ts and decisions.ts - Rewrite tests to import and call the real pure functions directly - removes the three duplicate helpers (applyEnable/applyDisable/readDebugState) that mirrored production logic - Add missing malformed-JSON test for the disable path (parallel to enable) applies ADR-007 Co-Authored-By: Claude <[email protected]>
…eliability fixes to debug-trace: 1. Extract size guard logic into _devflow_dbg_size_guard() helper so both devflow_debug_init (Phase 1 global log) and devflow_debug_set_cwd (Phase 2 per-project log) share the same 5MB-cap/2.5MB-tail truncation logic. Fixes the missing guard on the per-project log path. 2. Replace wc -c subprocess with stat -f%z (macOS/BSD) / stat -c%s (GNU/Linux) in the size guard, with wc -c as last-resort fallback. Avoids an unnecessary subprocess fork on every debug-enabled hook invocation. Adds regression test: devflow_debug_set_cwd truncates per-project log when it exceeds 5MB, confirming the new guard fires and the new message is retained. Co-Authored-By: Claude <[email protected]>
…s corrected: 1. Move devflow_debug_set_cwd after CWD validation -- was called before the empty/non-directory CWD guard, inconsistent with all 6 other hooks (sidecar-dispatch pattern). Calling with an invalid CWD would silently set a bad Phase 2 log path. 2. Move log-paths sourcing after MEMORY_ENABLED gate -- was sourced before the memory:false early-exit, paying subprocess overhead on every stop hook invocation regardless of memory state. log() is only called in the memory-specific section, so deferring to after the gate is safe (no log() calls in the decisions scanner path). Co-Authored-By: Claude <[email protected]>
- Change || echo "/tmp" to || echo "${TMPDIR:-/tmp}" in all 6 hooks so the
fallback prefers the user-private temp directory on macOS (TMPDIR=/var/folders/...)
instead of world-readable /tmp with predictable filenames.
- Add a once-per-invocation log size guard after LOG_FILE is set in each hook:
truncates to tail 1MB when the log exceeds 2MB, using a PID-unique temp file
for atomicity (same pattern as debug-trace 5MB guard). Guard runs once at
hook init time, not on every log() call, so overhead is negligible.
Hooks updated: pre-compact-memory, session-start-context, session-start-memory,
sidecar-evaluate. (sidecar-capture and sidecar-dispatch were already fixed in
a prior commit on this branch.)
Co-Authored-By: Claude <[email protected]>
Summary
response_text→last_assistant_messagefield rename, removes deadstop_reasonfilter. The Claude Code Stop hook input format changed (mid-May 2026) which silently broke working memory capture — queue accumulated 1,640+ user-only turns with zero assistant turns.DEVFLOW_HOOK_DEBUG=1debug tracing to all 7 hooks via newscripts/hooks/debug-traceshared helper. Two-phase logging: global fallback pre-CWD, per-project log post-CWD.devflow debug --enable/--disable/--statustogglesDEVFLOW_HOOK_DEBUGin~/.claude/settings.jsonenv block.Key design decisions
dbg() { :; }no-op defined beforeset -ein every hook — missingdebug-tracefile never causes hook failureDEVFLOW_HOOK_DEBUG!=1— zero overhead in normal operation (no subshells, no mkdir)Files changed
scripts/hooks/debug-tracescripts/hooks/sidecar-capture/tmp/debug with standardized systemscripts/hooks/sidecar-dispatchscripts/hooks/sidecar-evaluatescripts/hooks/session-start-memoryscripts/hooks/session-start-contextscripts/hooks/pre-compact-memoryscripts/hooks/preamblesrc/cli/commands/debug.tssrc/cli/cli.tstests/shell-hooks.test.tstests/sentinel.test.tsTest plan
devflow debug --enable→ verifyDEVFLOW_HOOK_DEBUGin settings.jsondevflow debug --disable→ verify removal from settings.json~/.devflow/logs/{slug}/.hook-debug.loghas entries from multiple hooks