Skip to content

fix: robust MCP startup root resolution and deferred scan fallback#500

Open
idea404 wants to merge 1 commit into
justrach:mainfrom
idea404:fix/mcp-startup-root-handling
Open

fix: robust MCP startup root resolution and deferred scan fallback#500
idea404 wants to merge 1 commit into
justrach:mainfrom
idea404:fix/mcp-startup-root-handling

Conversation

@idea404
Copy link
Copy Markdown

@idea404 idea404 commented May 27, 2026

Linked Issues

Closes #502

Related (already closed): #384, #371, #278

Summary

codedb mcp was fragile when launched by MCP clients (opencode, Claude Code, etc.) from an unexpected cwd or without usable roots. The server entered scan=loading_snapshot with files=0 and never recovered, producing a functionally empty MCP server. codedb mcp --help also started the server instead of printing help, and unknown flags like --snapshot were silently ignored.

Root Resolution Order (after fix)

  1. Explicit path: codedb mcp /path/to/repo or codedb /path mcp
  2. CODEDB_ROOT env var
  3. Git root from cwd (new — git rev-parse --show-toplevel)
  4. MCP client roots via roots/list handshake
  5. cwd fallback (after 3s timeout or if client has no roots capability)

Files Changed

File Change
src/git.zig Added getGitRoot() function (+24 lines)
src/main.zig MCP arg parsing (--help, [path], flag rejection), git root detection, flush before exit, terminal ready state when no indexable root (+65 lines)
src/mcp.zig Fallback to cwd on roots/list error, missing result, or malformed roots array (+40 lines)
src/test_mcp.zig 7 new unit tests (+164 lines)
scripts/e2e_mcp_test.py 4 new E2E scenarios / 13 tests (+219 lines)

Total: 506 lines (over 500-line guideline due to comprehensive test coverage)

Tests Run

zig build test
# 626/626 pass

python3 scripts/e2e_mcp_test.py --binary zig-out/bin/codedb --project /Users/dennis/Projects/codedb
# 30/30 pass

Before (failing behavior)

cd /Users/dennis/Projects/personal
codedb snapshot        # snapshot has 17 files
codedb mcp             # files=0 scan=loading_snapshot (stuck)
codedb mcp --help      # starts MCP server instead of showing help
codedb mcp --snapshot  # silently ignored, starts server

After (fixed behavior)

codedb mcp --help      # prints usage, exits 0
codedb mcp --snapshot  # ✗ unknown flag for mcp: --snapshot (exits 1)
codedb mcp             # resolves git root from cwd, scans correctly
codedb mcp /path       # indexes explicit path immediately

Non-Regression

  • All existing E2E scenarios (S1-S3) still pass
  • Existing unit tests for ScanState, root_policy, deferred scan, bundle, telemetry all pass
  • codedb <root> mcp (positional root before command) still works as before

Rebase

Rebased onto current upstream/main (commit 6317fe8).

Generated Files

No generated files committed. codedb.snapshot was restored to its original state.

Confirmation

This submission matches CONTRIBUTING.md requirements:

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 265de894e3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/main.zig Outdated
Fix codedb mcp startup fragility when launched by MCP clients (opencode,
Claude Code, etc.) from an unexpected cwd or without usable roots. The
server entered scan=loading_snapshot with files=0 and never recovered.

Root resolution order:
1. Explicit path: codedb mcp /path or codedb /path mcp
2. CODEDB_ROOT env var
3. Git root from cwd (new: git rev-parse --show-toplevel)
4. MCP client roots via roots/list handshake
5. cwd fallback (after 3s timeout)

Changes:
- src/main.zig: parse mcp [path], --help/-h, reject unknown flags
  (--snapshot etc), git root detection, flush before exit, terminal
  ready state when no indexable root available (with race guard:
  check ds.triggered before marking ready to avoid premature
  completion when a roots response fires the scan just before timeout)
- src/mcp.zig: trigger cwd fallback on roots/list error, missing
  result, or malformed roots array
- src/git.zig: add getGitRoot() for git repo root detection

Tests: 7 new unit tests, 4 new E2E scenarios (13 tests).
All 30/30 E2E tests pass. 626/626 unit tests pass.

Closes justrach#384, justrach#371, justrach#278
@idea404 idea404 force-pushed the fix/mcp-startup-root-handling branch from 265de89 to 35cc057 Compare May 27, 2026 11:52
@idea404
Copy link
Copy Markdown
Author

idea404 commented May 27, 2026

Good catch — fixed in 35cc057.

The issue was that triggerDeferredScanWithFallback returns false in two distinct cases:

  1. Already triggeredds.triggered.swap() returns true (a roots response fired the scan just before the 3s timeout)
  2. No usable path — both roots and cwd fallback are empty/denied

The original code treated both the same way, prematurely marking scan_done = true and scan_state = .ready even when a background scan was already in-flight.

Fix: Check ds.triggered.load(.acquire) before calling the trigger function. Only transition to ready when both conditions hold:

  • triggerDeferredScanWithFallback returned false
  • ds.triggered was not already true

If already_triggered is true, the scan thread is running and will set ready when it finishes — the loop continues waiting on ctx.scan_done as before.

Added a regression test (triggerDeferredScanWithFallback returns false when already triggered (race)) that pre-sets ds.triggered = true and verifies the function returns false without firing a second scan.

All 30/30 E2E tests and 626/626 unit tests pass.

@justrach
Copy link
Copy Markdown
Owner

Hey @idea404, thanks for the very thorough PR — fully verified locally and security-clean. Wanted to flag something + ask how you'd like to proceed.

Status

Most of what this PR fixes shipped in v0.2.5821 (merged via #509 / #510). The 2026-05-28 release closed #502 along with six other issues triaged the same day. The MCP arg-parser overhaul, mcp --help, unknown-flag rejection, git-root detection, and the deferred-scan terminal-ready transition all landed there — implemented slightly differently (pure-Zig .git walk vs git rev-parse, factored parsePositional, Out.exitWithFlush helper) but with the same observable behavior.

What's genuinely additive in this PR

Your handleResponse changes in src/mcp.zig cover four roots/list error paths I missed in v0.2.5821:

  1. Client replies with error field → cwd fallback
  2. Client replies without result field → cwd fallback
  3. result present but not an object → cwd fallback
  4. roots field missing or not an array → cwd fallback

My version only handled the post-timeout cwd fallback in watcherDeferredLoop, which means a fast-but-malformed client reply would still leave the server in deferred mode until the 13 s timeout fires. Your path catches those cases immediately. This part is worth shipping.

How would you like to proceed?

  • (a) Close this PR as superseded by v0.2.5821, and I'll open a follow-up that cherry-picks just the mcp.zig roots/list error-path fallbacks. Credits you in the commit.
  • (b) You trim this PR down to just the mcp.zig changes (drop git.zig, main.zig, test_mcp.zig, e2e_mcp_test.py since those are now redundant with what's on main) and we merge that.
  • (c) Something else — happy to hear what works for you.

Either way, thanks for the careful test coverage — the new e2e scenarios are nice; we may want to keep S6 ("empty roots/list falls back to cwd") and S7 ("stdout stays clean") in some form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: codedb mcp stuck in loading_snapshot with files=0 when launched from unexpected cwd

2 participants