fix(hooks): stop hook field rename + debug tracing system by dean0x · Pull Request #228 · dean0x/devflow

dean0x · 2026-05-27T13:31:21Z

Summary

Stop hook fix: response_text → last_assistant_message field rename, removes dead stop_reason filter. The Claude Code Stop hook input format changed (mid-May 2026) which silently broke working memory capture — queue accumulated 1,640+ user-only turns with zero assistant turns.
Debug tracing system: Adds consistent DEVFLOW_HOOK_DEBUG=1 debug tracing to all 7 hooks via new scripts/hooks/debug-trace shared helper. Two-phase logging: global fallback pre-CWD, per-project log post-CWD.
CLI command: devflow debug --enable/--disable/--status toggles DEVFLOW_HOOK_DEBUG in ~/.claude/settings.json env block.
Normal logging: 4 hooks gain standard file logging (sidecar-dispatch, session-start-memory, session-start-context, pre-compact-memory).

Key design decisions

dbg() { :; } no-op defined before set -e in every hook — missing debug-trace file never causes hook failure
Debug functions short-circuit immediately when DEVFLOW_HOOK_DEBUG!=1 — zero overhead in normal operation (no subshells, no mkdir)
jq INPUT keys subshell guarded behind debug check to avoid unnecessary process spawn on every stop hook

Files changed

File	Change
`scripts/hooks/debug-trace`	New — shared debug helper
`scripts/hooks/sidecar-capture`	Field rename + replace ad-hoc `/tmp/` debug with standardized system
`scripts/hooks/sidecar-dispatch`	Add debug-trace + normal logging
`scripts/hooks/sidecar-evaluate`	Add debug-trace layer
`scripts/hooks/session-start-memory`	Add debug-trace + normal logging
`scripts/hooks/session-start-context`	Add debug-trace + normal logging
`scripts/hooks/pre-compact-memory`	Extract SCRIPT_DIR, add debug-trace + normal logging
`scripts/hooks/preamble`	Add debug-trace (no normal logging)
`src/cli/commands/debug.ts`	New — CLI toggle command
`src/cli/cli.ts`	Register debug command
`tests/shell-hooks.test.ts`	Update field names, add debug-trace syntax check
`tests/sentinel.test.ts`	Update field names

Test plan

Build passes (1571 tests, 62 test files)
All hooks source debug-trace with safe fallback pattern
devflow debug --enable → verify DEVFLOW_HOOK_DEBUG in settings.json
devflow debug --disable → verify removal from settings.json
With debug ON: run a session, check ~/.devflow/logs/{slug}/.hook-debug.log has entries from multiple hooks
With debug OFF: verify .hook-debug.log is not written to
Temporarily rename debug-trace → hooks still work (no-op fallback)

…nges: 1. Fix stop hook field rename (response_text -> last_assistant_message) that broke working memory capture across all projects. Also removes the stop_reason filter (no longer needed -- last_assistant_message is only populated on real assistant turns). Updates all tests to use the new field name. 2. Add consistent DEVFLOW_HOOK_DEBUG=1 debug tracing to all 7 hooks via a new shared scripts/hooks/debug-trace helper. Each hook now has a safe dbg() no-op fallback before set -e so missing debug-trace never causes hook failure. Phase 1 traces to ~/.devflow/logs/.hook-debug.log (pre-CWD global fallback), Phase 2 switches to the per-project log. Adds normal file logging to sidecar-dispatch, session-start-memory, session-start-context, and pre-compact-memory. New devflow debug --enable/--disable/--status CLI command toggles DEVFLOW_HOOK_DEBUG in ~/.claude/settings.json env block. Co-Authored-By: Claude <[email protected]>

…LOW_HOOK_DEBUG!=1 to avoid unnecessary mkdir syscalls on every hook invocation - sidecar-capture: guard jq INPUT keys subshell behind debug check so it only runs when tracing is active; remove duplicate dbg log - sidecar-dispatch: remove duplicate CWD debug log (already logged with PROMPT_LENGTH on earlier line) - shell-hooks.test.ts: add debug-trace to HOOK_SCRIPTS syntax check list

dean0x · 2026-05-27T14:37:15Z

scripts/hooks/sidecar-capture (Line 61)

The log() function here uses bare >> "$LOG_FILE" without error suppression. All four newly-added log() definitions in pre-compact-memory:44, session-start-memory:41, session-start-context:50, and sidecar-dispatch:48 use 2>/dev/null || true. Under set -e, a write failure would abort the hook, contradicting the stated contract.

Suggested fix:

log() { echo "[\2026-05-27T14:37:14Z] [sidecar-capture] \$1" >> "\$LOG_FILE" 2>/dev/null || true; }

Review finding with 82% confidence (MEDIUM, reliability + security)

dean0x · 2026-05-27T14:37:19Z

scripts/hooks/sidecar-capture (Line 8)

The debug initialization preamble is duplicated across all 7 hooks. Each includes nearly identical:

dbg() { :; } no-op definition
source debug-trace
devflow_debug_init
dbg "=== HOOK START ==="

This pattern appears in sidecar-capture:8-18, sidecar-dispatch:8-22, sidecar-evaluate:7-21, session-start-memory:7-17, session-start-context:11-22, pre-compact-memory:10-19, and preamble:7-17.

Consider extracting into a shared helper (e.g., init-hook) that each hook calls with just the hook name. This would reduce the repeated ceremony and make future hook additions simpler.

Review finding with 85% confidence (HIGH, complexity)

dean0x · 2026-05-27T14:37:23Z

scripts/hooks/debug-trace (Line 29)

Debug log directories are created with mkdir -p but without restrictive permissions. Session metadata (session IDs, CWD paths, feature states) could be readable by other users on shared systems. The log-paths helper sets chmod 700 after mkdir — debug-trace should match.

Suggested fix:

mkdir -p "\$global_log_dir" 2>/dev/null || true
chmod 700 "\$global_log_dir" 2>/dev/null || true

Also add the same fix at line 42-43 for the Phase 2 per-project log directory.

Review finding with 85% confidence (MEDIUM, security)

dean0x · 2026-05-27T14:37:26Z

scripts/hooks/sidecar-dispatch (Line 149)

Unguarded command substitution $(...) in dbg argument forks subprocesses even when debug is disabled. This occurs in 4 places across the PR:

sidecar-dispatch:149, 157
session-start-context:142
sidecar-evaluate:463

These should match the pattern in sidecar-capture:33, which guards with if [ "\${DEVFLOW_HOOK_DEBUG:-}" = "1" ]; then.

Fix example:

if [ "\${DEVFLOW_HOOK_DEBUG:-}" = "1" ]; then
  dbg "Learned artifacts: commands=$(echo "\$LEARNED_COMMANDS" | grep -c , 2>/dev/null || echo 0)"
fi

Review finding with 92% confidence (MEDIUM, performance)

dean0x · 2026-05-27T14:37:29Z

scripts/hooks/preamble (Line 44)

Missing dbg "=== HOOK COMPLETE ===" at the end. All six other hooks that received debug tracing end with this bookend marker. The hook's control flow ends inside the if/else block (line 44), so add it after:

if [[ "\$PROMPT" == *"## Goal"* ]] && [[ "\$PROMPT" == *"## Steps"* ]] && [[ "\$PROMPT" == *"## Files"* ]]; then
  dbg "EXECUTION_PLAN detected — injecting directive"
  json_prompt_output "EXECUTION_PLAN detected. Invoke \`devflow:implement\` via the Skill tool to execute this plan."
else
  dbg "No plan markers detected — no output"
fi

dbg "=== HOOK COMPLETE ==="

Review finding with 90% confidence (MEDIUM, consistency)

dean0x · 2026-05-27T14:37:33Z

scripts/hooks/sidecar-capture (Line 59)

Inconsistent error handling for log-paths sourcing. Two patterns exist in this PR:

Pre-existing hooks (sidecar-capture:59, sidecar-evaluate:56): hard-fail

source "\$SCRIPT_DIR/log-paths" || { echo "..."; exit 1; }

Newly-added logging (pre-compact-memory:41, session-start-memory:38, session-start-context:47, sidecar-dispatch:45): soft-fail

source "\$SCRIPT_DIR/log-paths" || true

The soft-fail || true is more appropriate for optional logging — logging failure should not kill the hook. Recommend updating sidecar-capture and sidecar-evaluate to match the new hooks for consistency across the whole hook system.

Review finding with 85% confidence (HIGH, consistency)

dean0x · 2026-05-27T14:37:36Z

scripts/hooks/sidecar-evaluate (Line 496)

File size now 496 lines (was 465 on main), approaching the 500-line critical threshold. The reinforcement section (lines 167-241) has nesting up to 7 levels with 15+ lines of inline jq and equivalent node fallback length. This is a pre-existing structural issue, but the new debug instrumentation (31 new dbg lines) pushes it closer to critical.

Consider a follow-up refactor: Extract the 4 major evaluation sections into sourced helpers:

sidecar-eval-learning
sidecar-eval-decisions
sidecar-eval-knowledge
sidecar-eval-reinforce

This reduces the main script to a dispatcher. Applies PF-004 thinking — avoids migration complexity by preserving behavior while improving file organization.

Review finding with 90% confidence (MEDIUM, complexity)

dean0x · 2026-05-27T14:37:40Z

src/cli/commands/debug.ts (Line 27)

JSON.parse(raw) can throw on malformed JSON. The catch block silently falls back to {}, conflating "file not found" with "corrupted JSON". If settings.json exists but is corrupted, the command will silently overwrite it with a minimal object, destroying existing settings.

Suggested fix — differentiate error types:

try {
  const raw = await fs.readFile(settingsPath, 'utf-8');
  settings = JSON.parse(raw) as Record<string, unknown>;
} catch (err: unknown) {
  if (err instanceof SyntaxError) {
    p.log.error('settings.json is malformed — fix it before modifying env vars');
    return;
  }
  // ENOENT or other fs error — safe to start fresh
  settings = {};
}

Review finding with 85% confidence (HIGH, TypeScript)

dean0x · 2026-05-27T14:37:43Z

src/cli/commands/debug.ts (Line 32)

Unsafe type assertion: settings.env cast to Record<string, string> | undefined without runtime validation. If settings.json contains "env": 42 or "env": "string", subsequent property access (env.DEVFLOW_HOOK_DEBUG, delete env.DEVFLOW_HOOK_DEBUG) will fail silently or produce unexpected results.

Add runtime type guard:

const rawEnv = settings.env;
const env: Record<string, string> =
  (typeof rawEnv === 'object' && rawEnv !== null && !Array.isArray(rawEnv))
    ? (rawEnv as Record<string, string>)
    : {};

Note: Applies ADR-001 (clean break philosophy) — no backward compatibility needed, just do it right from the start.

Review finding with 82% confidence (MEDIUM, TypeScript)

dean0x · 2026-05-27T14:38:02Z

Review Summary

Overall: APPROVED_WITH_CONDITIONS (field rename is solid, debug system well-designed, actionable improvements needed)

High-Confidence Findings (≥80%) — 9 Comments Posted

All critical issues are addressed in individual comments above. Priority fixes:

Error suppression (sidecar-capture:61 + debug-trace:29) — reliability + security
Unguarded $() in dbg calls (4 locations) — performance
log-paths error handling consistency (sidecar-capture:59, sidecar-evaluate:56) — consistency
TypeScript validation (debug.ts:27, debug.ts:32) — type safety

Lower-Confidence Suggestions (60-79%)

These go in summary (not as inline comments):

Debug log unbounded growth (debug-trace) — 65% — When DEVFLOW_HOOK_DEBUG=1 is active, logs grow without rotation. Add size check in devflow_debug_init() to truncate at 5MB threshold, keeping tail (similar to production log rotation).
Dual logging cognitive load (all 7 hooks) — 82% — Every hook now has both dbg() (debug-only) and log() (always-on) with near-identical messages at many call sites. Consider having log() proxy to dbg() with same message, so only one call is needed.
Internal variable RESPONSE_TEXT (sidecar-capture:46) — 65% — Field renamed from response_text to last_assistant_message but internal bash variable is still RESPONSE_TEXT. Renaming would improve traceability, though internal names are not contractual.
devflow_debug_set_cwd timing (sidecar-dispatch:34) — 70% — dbg "CWD=..." fires before devflow_debug_set_cwd (line 38), so goes to global fallback log. All other hooks log CWD after setting it. Minor ordering divergence.

Test Coverage Gaps (HIGH priority for follow-up)

Per the Testing reviewer:

No behavioral tests for debug-trace (90% confidence) — Shell helper is sourced by all 7 hooks but has only syntax checks. Add tests for: no-op when disabled, global log when enabled, per-project log after CWD set.
No tests for debug.ts CLI (92% confidence) — Settings mutations need coverage. Create tests/debug.test.ts with: --enable/--disable/--status behavior, env var handling, graceful failure modes.
No regression test for stop_reason removal (85% confidence) — Verify that stop_reason: "tool_use" WITH non-empty last_assistant_message now proceeds to capture (old behavior would have skipped).

Field Rename Validation

✅ Complete and correct:

All 19 test input objects updated from { stop_reason, response_text } to { last_assistant_message }
Zero remaining references to old field names in codebase
jq and node extraction paths both correct
cut field index correctly adjusted (field 3 → field 2)
All 1571 tests pass

Design Notes

Applies ADR-001 (clean break): No backward-compatibility layer for old fields — correct decision
debug-trace architecture: Clean two-phase logging (global then per-project), graceful no-op fallback, proper initialization ordering in most hooks
Avoids PF-003 (migration complexity): Field rename is in hooks installed at init time; users must run devflow init to pick up changes — existing pattern

Next Steps

Apply the 9 high-confidence inline fixes (likely <30 mins)
Add test coverage for debug-trace and debug.ts (consider for same PR or quick follow-up)
Evaluate lower-confidence suggestions (log rotation, dual-logging pattern) for follow-up PRs

Generated by devflow:git comment-pr operation

- Catch SyntaxError from JSON.parse separately to prevent silently overwriting a malformed settings.json with {} - Replace unsafe cast of settings.env with a runtime type guard that falls back to {} for non-object values Co-Authored-By: Claude <[email protected]>

…- debug-trace: add chmod 700 after mkdir in both Phase 1 and Phase 2 to match log-paths pattern and restrict debug log directory permissions - debug-trace: add 5MB size guard to prevent unbounded log growth; truncates to tail 2.5MB using PID-unique temp file for atomicity - sidecar-capture: add 2>/dev/null || true to log() to prevent set -e abort on log write failures - sidecar-capture/sidecar-evaluate: change log-paths sourcing from hard-fail to soft-fail matching the 4 new hooks; use two-step LOG_DIR/LOG_FILE pattern with /tmp fallback for consistency and defensiveness - sidecar-evaluate: remove now-redundant mkdir -p on LOG_FILE dirname - session-start-context, sidecar-dispatch, sidecar-evaluate: guard unguarded subshell calls in dbg arguments behind DEVFLOW_HOOK_DEBUG=1 check to eliminate unnecessary subprocess forks when debug is OFF - preamble: add missing dbg HOOK COMPLETE bookend for consistency

…n regression - tests/shell-hooks.test.ts: add 4 debug-trace behavioral tests verifying that dbg() is a no-op when DEVFLOW_HOOK_DEBUG is unset, writes to global log when enabled, and devflow_debug_set_cwd switches to per-project log - tests/sentinel.test.ts: add stop_reason=tool_use regression test confirming capture proceeds when last_assistant_message is present regardless of stop_reason value; add log file creation test for sidecar-capture - tests/debug.test.ts: new test file covering devflow debug --enable/--disable/ --status/malformed settings.json — the only CLI command previously without tests Co-Authored-By: Claude <[email protected]>

- debug-trace: harden tail+mv sequence with explicit conditional to ensure temp file cleanup on tail failure - sidecar-dispatch: remove duplicate PROMPT_LENGTH debug log (already logged on prior line) - debug.test.ts: remove unused import (path module) Co-Authored-By: Claude <[email protected]>

dean0x · 2026-05-27T16:06:59Z

Review Summary: CHANGES_REQUESTED

Comprehensive review across 9 dimensions identified 4 blocking architectural/reliability issues and 2 consistency violations requiring fixes before merge. All test coverage, regression, and security hygiene are solid. Field rename migration is complete with zero stale references.

Blocking Issues

1. Missing 5MB Size Guard on Phase 2 Debug Log (Reliability + Security)

File: scripts/hooks/debug-trace:46-59
Confidence: 92% (Reliability), 85% (Security)

The global debug log (Phase 1) has a 5MB truncation guard, but the per-project log (Phase 2) does not. This reintroduces the unbounded growth risk on the primary logging path (Phase 2 fires on 80% of hook invocations). Every 100 prompt cycles with debug enabled will create ~100-150MB of debug logs with no cleanup.

Fix: Copy the exact 5MB truncation pattern into devflow_debug_set_cwd() after setting _DEVFLOW_DBG_LOG.

2. Feedback-Loop Guard Inconsistency (Consistency)

File: scripts/hooks/sidecar-dispatch:14-16 and sidecar-evaluate:13-15
Confidence: 88% (Consistency)

sidecar-capture moved its DEVFLOW_BG_* guards after debug-trace sourcing with dbg annotations. sidecar-dispatch and sidecar-evaluate still place guards before debug-trace with no debug tracing on exit. This breaks the stated goal: "all hooks now follow same debug-trace sourcing pattern."

Fix: Move feedback-loop guards after debug-trace sourcing and add dbg "EXIT: bg_updater" etc. to match sidecar-capture.

3. `devflow_debug_set_cwd` Called Before CWD Validation (Consistency)

File: scripts/hooks/sidecar-capture:48 vs 52
Confidence: 85% (Consistency)

In sidecar-capture, devflow_debug_set_cwd "$CWD" is called at line 48, but CWD validation happens at line 52. All other 6 hooks validate first. When CWD is empty, this attempts to create ~/.devflow/logs/-/ (from sed 's|^/||' on empty string), violating the established pattern.

Fix: Swap lines so CWD validation comes before devflow_debug_set_cwd.

4. `debug.ts` Breaks Pure-Function Pattern (Architecture)

File: src/cli/commands/debug.ts:19-81
Confidence: 88% (Architecture)

The debug.ts command inlines the entire settings.json read-parse-mutate-write cycle instead of extracting pure functions. Every other settings-mutating command (flags.ts, ambient.ts, hud.ts) uses the pattern: applyX(settingsJson: string): string. This coupling makes the env mutation untestable without filesystem mocks, and the test file duplicates this logic locally rather than importing it.

Fix: Extract applyDebugTrace(settingsJson: string): string and stripDebugTrace(settingsJson: string): string as pure functions (similar to applyFlags/stripFlags). The command action becomes a thin I/O wrapper.

Should-Fix Issues (High Confidence, Non-Blocking)

5. Performance Regression: `log-paths` Sourced Before Early Exit (Performance)

File: scripts/hooks/sidecar-capture:59-62
Confidence: 85%

When memory:false is set, the hook now sources log-paths and calls devflow_log_dir (4 subprocess forks) before exiting at line 98-101, whereas previously it exited before sourcing. This adds 15-20ms to every stop-hook invocation on projects with memory:false.

Fix: Use lazy-init pattern—define log() only when first needed (after the memory gate).

6. Unsafe Type Assertion on Environment Values (TypeScript)

File: src/cli/commands/debug.ts:39
Confidence: 85%

The code asserts rawEnv as Record<string, string> without verifying that all values are actually strings. If settings.json contains "env": { "DEVFLOW_HOOK_DEBUG": 1 } (number), the assertion silently succeeds and the === '1' check fails.

Fix: Filter values to only include strings:

const env: Record<string, string> =
  (typeof rawEnv === 'object' && rawEnv !== null && !Array.isArray(rawEnv))
    ? Object.fromEntries(
        Object.entries(rawEnv as Record<string, unknown>)
          .filter((entry): entry is [string, string] => typeof entry[1] === 'string')
      )
    : {};

7. Mutating Input Object (TypeScript)

File: src/cli/commands/debug.ts:39-44
Confidence: 82%

The env variable is assigned directly from rawEnv without spreading. Mutations to env (adding DEVFLOW_HOOK_DEBUG = '1') mutate the original parsed object. The test file correctly uses spreading—production should match.

Fix: Apply spread operator: { ...(rawEnv as Record<string, unknown>) }.

Summary

Category	Count
Blocking (must fix)	4
Should-Fix (recommended)	3
False Positives Cleared	6
Regression Tests Passing	All

Recommendation: Fix the 4 blocking issues (5MB guard, feedback-loop guards, CWD validation order, pure functions pattern), then re-review debug.ts TypeScript surface for the type assertions.

dean0x

Missing 5MB Size Guard on Phase 2 Debug Log

Line 46-59: devflow_debug_set_cwd creates the per-project debug log without any size guard.

The Phase 1 global log (lines 32-40) has a 5MB truncation guard that keeps the tail 2.5MB. Phase 2 (here) switches to per-project logging but omits this guard entirely. Since most debug output occurs post-CWD (Phase 2), the per-project logs will accumulate the bulk of debug traffic and grow without bound over extended debug sessions.

Fix: Add the same 5MB guard pattern after setting _DEVFLOW_DBG_LOG at line 55:

```bash
_DEVFLOW_DBG_LOG="$project_log_dir/.hook-debug.log"

Size guard: same as Phase 1

if [ -f "$_DEVFLOW_DBG_LOG" ] && [ "$(wc -c < "$_DEVFLOW_DBG_LOG" 2>/dev/null || echo 0)" -gt 5242880 ]; then
local _TAIL_TMP="$_DEVFLOW_DBG_LOG.tmp.$$"
if tail -c 2621440 "$_DEVFLOW_DBG_LOG" > "$_TAIL_TMP" 2>/dev/null; then
mv "$_TAIL_TMP" "$_DEVFLOW_DBG_LOG" 2>/dev/null || rm -f "$_TAIL_TMP" 2>/dev/null || true
else
rm -f "$_TAIL_TMP" 2>/dev/null || true
fi
fi
```

Impact: Without this, 100 debug-enabled sessions will create 100-150MB of logs with no cleanup.

dean0x

Feedback-Loop Guard Inconsistency

sidecar-dispatch lines 14-16: Feedback-loop guards (DEVFLOW_BG_* checks) are placed before debug-trace sourcing with no debug tracing on exit.

Compare to sidecar-capture (lines 21-23), which moved these guards to after debug-trace sourcing and added dbg exit annotations:
```bash
if [ "${DEVFLOW_BG_UPDATER:-}" = "1" ]; then dbg "EXIT: bg_updater"; exit 0; fi
```

This inconsistency breaks the PR's stated goal: "all hooks now follow same debug-trace sourcing pattern."

Fix: Move lines 14-16 to after line 12 (after devflow_debug_init and dbg HOOK START), and add dbg annotations to each guard:
```bash
devflow_debug_init "sidecar-dispatch"
dbg "=== HOOK START ==="

if [ "${DEVFLOW_BG_UPDATER:-}" = "1" ]; then dbg "EXIT: bg_updater"; exit 0; fi
if [ "${DEVFLOW_BG_LEARNER:-}" = "1" ]; then dbg "EXIT: bg_learner"; exit 0; fi
if [ "${DEVFLOW_BG_KNOWLEDGE_REFRESH:-}" = "1" ]; then dbg "EXIT: bg_knowledge"; exit 0; fi
```

Same fix needed in sidecar-evaluate lines 13-15.

dean0x

CWD Validation Ordering

sidecar-capture line 48 vs 52: devflow_debug_set_cwd "\$CWD" is called at line 48, but CWD validation happens at line 52 (if [ -z "\$CWD" ] || [ ! -d "\$CWD" ]).

All other 6 hooks validate CWD first, then call devflow_debug_set_cwd. When CWD is empty, the current order attempts to create ~/.devflow/logs/-/ (from sed 's|^/||' on an empty string), which violates the established pattern and wastes a syscall.

Fix: Swap the order. Validation should occur before devflow_debug_set_cwd:
```bash
CWD=$(printf '%s' "$_FIELDS" | cut -d$'\001' -f1)
RESPONSE_TEXT=$(printf '%s' "$_FIELDS" | cut -d$'\001' -f2-)

if [ -z "$CWD" ] || [ ! -d "$CWD" ]; then dbg "EXIT: bad CWD"; exit 0; fi

devflow_debug_set_cwd "$CWD"
```

dean0x

Architectural Violation: Break Pure-Function Pattern

src/cli/commands/debug.ts lines 19-81: The command inlines the entire settings.json read-parse-mutate-write cycle instead of extracting pure functions.

Every other settings-mutating command in devflow (flags.ts, ambient.ts, hud.ts) uses this pattern:

`applyX(settingsJson: string): string` — accepts raw JSON string, returns transformed string
`stripX(settingsJson: string): string` — reverse transform
Command action handler calls these pure functions and handles file I/O separately

Current problem:

`debug.ts` couples file I/O, JSON parsing, env mutation, and writing into a single monolithic action handler
The logic is untestable without filesystem mocks
The test file re-implements this logic locally (`applyEnable`, `applyDisable`, `readDebugState`) rather than importing it — tests validate their copy, not production code
The logic cannot be reused by `devflow init` or other orchestrators

Fix: Extract pure functions to a shared utility (e.g., `src/cli/utils/debug-settings.ts`):

```typescript
export function applyDebugTrace(settingsJson: string): string {
const settings = JSON.parse(settingsJson) as Record<string, unknown>;
const rawEnv = settings.env;
const env: Record<string, string> =
(typeof rawEnv === 'object' && rawEnv !== null && !Array.isArray(rawEnv))
? { ...(rawEnv as Record<string, unknown>) }
: {};
env.DEVFLOW_HOOK_DEBUG = '1';
settings.env = env;
return JSON.stringify(settings, null, 2) + '\n';
}

export function stripDebugTrace(settingsJson: string): string {
const settings = JSON.parse(settingsJson) as Record<string, unknown>;
const rawEnv = settings.env;
const env: Record<string, string> =
(typeof rawEnv === 'object' && rawEnv !== null && !Array.isArray(rawEnv))
? { ...(rawEnv as Record<string, unknown>) }
: {};
delete env.DEVFLOW_HOOK_DEBUG;
if (Object.keys(env).length === 0) delete settings.env;
else settings.env = env;
return JSON.stringify(settings, null, 2) + '\n';
}
```

Then the command action becomes:
```typescript
const raw = await fs.readFile(settingsPath, 'utf-8');
const updated = cmd === 'enable' ? applyDebugTrace(raw) : stripDebugTrace(raw);
await fs.writeFile(settingsPath, updated, 'utf-8');
```

And tests import and call the pure functions directly.

dean0x

Unsafe Type Assertion on Environment Values

src/cli/commands/debug.ts line 39: The code asserts `rawEnv as Record<string, string>` without validating that all values are actually strings.

Problem: If `settings.json` contains:
```json
{ "env": { "DEVFLOW_HOOK_DEBUG": 1 } }
```
(number instead of string), the `as` assertion silently succeeds and the `=== '1'` comparison at line 64 will never match, silently failing to detect the enabled state.

The `env` object in `settings.json` is a user-editable trust boundary and requires value-level validation.

Fix: Filter values to ensure only strings are included:

```typescript
const rawEnv = settings.env;
const env: Record<string, string> =
(typeof rawEnv === 'object' && rawEnv !== null && !Array.isArray(rawEnv))
? Object.fromEntries(
Object.entries(rawEnv as Record<string, unknown>)
.filter((entry): entry is [string, string] => typeof entry[1] === 'string')
)
: {};
```

This ensures non-string values (numbers, objects, etc.) are silently dropped and `env` contains only valid string key-value pairs.

dean0x

Mutating Input Object

src/cli/commands/debug.ts lines 39-44: The `env` variable is assigned directly from `rawEnv` without spreading, causing mutations to mutate the original parsed object.

Current code:
```typescript
const env = (typeof rawEnv === 'object' && rawEnv !== null && !Array.isArray(rawEnv))
? (rawEnv as Record<string, string>) // Direct reference, no copy
: {};
env.DEVFLOW_HOOK_DEBUG = '1'; // Mutates the original settings.env object
```

Test file does it correctly (line 37):
```typescript
? { ...(rawEnv as Record<string, string>) } // Spread creates a copy
```

While functionally correct for the current write-back flow, this violates immutability by default principle. Production code should match the test pattern.

Fix: Apply spread operator:
```typescript
const env: Record<string, string> =
(typeof rawEnv === 'object' && rawEnv !== null && !Array.isArray(rawEnv))
? { ...(rawEnv as Record<string, unknown>) }
: {};
```

…tch/evaluate Three consistency fixes across sidecar-dispatch and sidecar-evaluate (batch-4): 1. Consistent guard syntax: convert sidecar-evaluate feedback-loop guards from short-circuit form to if/then/fi matching sidecar-dispatch. Add explanatory comment to both hooks: "Feedback-loop guards — before debug-trace to minimize background session overhead". Guards intentionally stay pre-debug-trace so background agent sessions exit without sourcing debug-trace overhead. 2. Missing dbg annotations: add dbg "EXIT: no json" and dbg "EXIT: bad CWD" to the two post-debug-trace early-exit points in sidecar-dispatch, matching sidecar-capture style. 3. CWD validation: strengthen sidecar-evaluate CWD check from [ -z "$CWD" ] to [ -z "$CWD" ] || [ ! -d "$CWD" ] matching the 3 other hooks that check both conditions. sidecar-dispatch already had the dual check. Co-Authored-By: Claude <[email protected]>

… - Extract applyDebugTrace/stripDebugTrace/readDebugStatus as pure string->string functions following the applyFlags/stripFlags pattern from flags.ts - Fix unsafe rawEnv cast: filter entries with type guard (Object.fromEntries + type predicate) instead of bare as Record<string, string> - Fix mutation bug: spread rawEnv into a fresh object before modifying it - matches what the tests already did correctly - Reorder CLI option processing to status->enable->disable, consistent with learn.ts and decisions.ts - Rewrite tests to import and call the real pure functions directly - removes the three duplicate helpers (applyEnable/applyDisable/readDebugState) that mirrored production logic - Add missing malformed-JSON test for the disable path (parallel to enable) applies ADR-007 Co-Authored-By: Claude <[email protected]>

…eliability fixes to debug-trace: 1. Extract size guard logic into _devflow_dbg_size_guard() helper so both devflow_debug_init (Phase 1 global log) and devflow_debug_set_cwd (Phase 2 per-project log) share the same 5MB-cap/2.5MB-tail truncation logic. Fixes the missing guard on the per-project log path. 2. Replace wc -c subprocess with stat -f%z (macOS/BSD) / stat -c%s (GNU/Linux) in the size guard, with wc -c as last-resort fallback. Avoids an unnecessary subprocess fork on every debug-enabled hook invocation. Adds regression test: devflow_debug_set_cwd truncates per-project log when it exceeds 5MB, confirming the new guard fires and the new message is retained. Co-Authored-By: Claude <[email protected]>

…s corrected: 1. Move devflow_debug_set_cwd after CWD validation -- was called before the empty/non-directory CWD guard, inconsistent with all 6 other hooks (sidecar-dispatch pattern). Calling with an invalid CWD would silently set a bad Phase 2 log path. 2. Move log-paths sourcing after MEMORY_ENABLED gate -- was sourced before the memory:false early-exit, paying subprocess overhead on every stop hook invocation regardless of memory state. log() is only called in the memory-specific section, so deferring to after the gate is safe (no log() calls in the decisions scanner path). Co-Authored-By: Claude <[email protected]>

- Change || echo "/tmp" to || echo "${TMPDIR:-/tmp}" in all 6 hooks so the fallback prefers the user-private temp directory on macOS (TMPDIR=/var/folders/...) instead of world-readable /tmp with predictable filenames. - Add a once-per-invocation log size guard after LOG_FILE is set in each hook: truncates to tail 1MB when the log exceeds 2MB, using a PID-unique temp file for atomicity (same pattern as debug-trace 5MB guard). Guard runs once at hook init time, not on every log() call, so overhead is negligible. Hooks updated: pre-compact-memory, session-start-context, session-start-memory, sidecar-evaluate. (sidecar-capture and sidecar-dispatch were already fixed in a prior commit on this branch.) Co-Authored-By: Claude <[email protected]>

Dean Sharon added 2 commits May 27, 2026 16:18

Dean Sharon and others added 4 commits May 27, 2026 17:54

dean0x commented May 27, 2026

View reviewed changes

Dean Sharon and others added 5 commits May 27, 2026 22:32

This was referenced May 27, 2026

refactor: hook infrastructure DRY cleanup (debug bootstrap + log init helpers) #229

Open

refactor: sidecar-evaluate decomposition (learning/decisions/reinforcement modules) #230

Open

Conversation

dean0x commented May 27, 2026

Summary

Key design decisions

Files changed

Test plan

Uh oh!

dean0x commented May 27, 2026

scripts/hooks/sidecar-capture (Line 61)

Uh oh!

dean0x commented May 27, 2026

scripts/hooks/sidecar-capture (Line 8)

Uh oh!

dean0x commented May 27, 2026

scripts/hooks/debug-trace (Line 29)

Uh oh!

dean0x commented May 27, 2026

scripts/hooks/sidecar-dispatch (Line 149)

Uh oh!

dean0x commented May 27, 2026

scripts/hooks/preamble (Line 44)

Uh oh!

dean0x commented May 27, 2026

scripts/hooks/sidecar-capture (Line 59)

Uh oh!

dean0x commented May 27, 2026

scripts/hooks/sidecar-evaluate (Line 496)

Uh oh!

dean0x commented May 27, 2026

src/cli/commands/debug.ts (Line 27)

Uh oh!

dean0x commented May 27, 2026

src/cli/commands/debug.ts (Line 32)

Uh oh!

dean0x commented May 27, 2026

Review Summary

High-Confidence Findings (≥80%) — 9 Comments Posted

Lower-Confidence Suggestions (60-79%)

Test Coverage Gaps (HIGH priority for follow-up)

Field Rename Validation

Design Notes

Next Steps

Uh oh!

dean0x commented May 27, 2026

Review Summary: CHANGES_REQUESTED

Blocking Issues

1. Missing 5MB Size Guard on Phase 2 Debug Log (Reliability + Security)

2. Feedback-Loop Guard Inconsistency (Consistency)

3. devflow_debug_set_cwd Called Before CWD Validation (Consistency)

4. debug.ts Breaks Pure-Function Pattern (Architecture)

Should-Fix Issues (High Confidence, Non-Blocking)

5. Performance Regression: log-paths Sourced Before Early Exit (Performance)

6. Unsafe Type Assertion on Environment Values (TypeScript)

7. Mutating Input Object (TypeScript)

Summary

Uh oh!

dean0x left a comment

Choose a reason for hiding this comment

Missing 5MB Size Guard on Phase 2 Debug Log

Size guard: same as Phase 1

Uh oh!

dean0x left a comment

Choose a reason for hiding this comment

Feedback-Loop Guard Inconsistency

Uh oh!

dean0x left a comment

Choose a reason for hiding this comment

CWD Validation Ordering

Uh oh!

dean0x left a comment

Choose a reason for hiding this comment

Architectural Violation: Break Pure-Function Pattern

Uh oh!

dean0x left a comment

Choose a reason for hiding this comment

Unsafe Type Assertion on Environment Values

Uh oh!

dean0x left a comment

Choose a reason for hiding this comment

Mutating Input Object

3. `devflow_debug_set_cwd` Called Before CWD Validation (Consistency)

4. `debug.ts` Breaks Pure-Function Pattern (Architecture)

5. Performance Regression: `log-paths` Sourced Before Early Exit (Performance)