fix(cli): hard-exit past finalization to avoid lancedb segfault (139)#82
Merged
Conversation
190ab8c to
5178062
Compare
arelchan
previously approved these changes
Jul 2, 2026
d62ff9b to
25037c1
Compare
Any raven command that builds the agent loop opens a lancedb-backed store; lancedb starts a process-global Rust/tokio background thread (LanceDBBackgroundEventLoop) with no shutdown hook. Normal CPython interpreter finalization races that live native runtime and segfaults (exit 139), masking the command's real exit code and failing expect_exit(0) for the whole TUI e2e suite. Guard once at the console-script entry (raven.cli.commands.run): when the lancedb thread is live, flush stdio + loguru and os._exit past finalization; otherwise exit normally. CliRunner invokes the Typer app directly and never reaches this wrapper, so in-process test hosts keep normal exit semantics. Subsumes and removes the ad-hoc per-command os._exit in agent_commands. Add a subprocess regression test that builds the real agent loop, asserts the hazard gate fires, and checks a clean exit via the guarded path. Co-authored-by: Claude (claude-opus-4-8) <[email protected]>
The pre-commit ruff-format hook collapses two multi-line assert messages that fit within the 120-col limit. Formatting only; no behavior change. Co-authored-by: Claude (claude-opus-4-8) <[email protected]>
e2437db to
90fde8b
Compare
12 tasks
cyfyifanchen
pushed a commit
that referenced
this pull request
Jul 3, 2026
## Summary The commit-message lint rejects any `Merge ...` commit in the PR range (`scripts/commit_lint.py`: "merge commits are not allowed in PR ranges"). But this repository is squash-only (rebase/merge disabled), so a merge commit created by syncing `main` into a feature branch is squashed away at merge time and never reaches `main`. The rule therefore guards nothing while failing the CI check for every PR that synced `main` via a merge commit (observed on #82). The fix makes `scripts/check_commit_messages.py` gather only non-merge commits by passing `--no-merges` to its `git log` call, mirroring commitlint's own default of ignoring merge commits. The `Merge ` rule in `commit_lint.py` is left intact as belt-and-suspenders (it is also reused by the PR-title path), so no lint behavior is removed. ## Type - [ ] Fix - [ ] Feature - [ ] Docs - [x] CI / tooling - [ ] Refactor - [ ] Other ## Verification - `uv run pytest tests/test_check_commit_messages.py tests/test_commit_lint.py -q` -> 10 passed - `uv run ruff check scripts/check_commit_messages.py tests/test_check_commit_messages.py` -> All checks passed - Added TDD coverage in `tests/test_check_commit_messages.py`: builds a real temporary repo with a `--no-ff` merge commit and asserts the range passes (red before the fix, green after). - [x] Relevant tests pass locally - [x] Relevant lint / type checks pass locally - [ ] User-facing docs or screenshots are updated when needed ## Risk - [x] Security impact considered - [x] Backward compatibility considered - [x] Rollback path is clear for risky changes Backward compatible: real commit messages are still linted exactly as before; only merge commits are now skipped. Rollback is a one-line revert of the `--no-merges` flag. ## Related Issues N/A Co-authored-by: Claude (claude-opus-4-8) <[email protected]> Co-authored-by: Sheng Zhao <[email protected]> Co-authored-by: Claude (claude-opus-4-8) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change description
raven tuisegfaulted on exit (exit code 139 / SIGSEGV) after Ctrl+C, failingexpect_exit(0)for the ENTIRE TUI e2e suite (test_e2e_ctrl_cboth variants,test_e2e_tui_status_slash) and gating live-green for all TUI e2e.Root cause (verified — not a build artifact; reproduces on a clean build and
with no node child / no TTY): any command that builds the agent loop opens a
lancedb-backed store, and lancedb starts a process-global Rust/tokio background
thread (
LanceDBBackgroundEventLoop) that is a daemon thread with no publicshutdown hook. A normal CPython interpreter finalization tears down while that
native runtime is still live and segfaults. The crash is 100% in
Py_FinalizeEx(both "loop built" and "reached end of main" print before the fault; faulthandler
shows
<no Python frame>with the pyarrow/numpy lance stack loaded). It isindependent of the long-term memory backend (reproduces with
memory.backenddisabled too).
Because lancedb has no clean shutdown, the fix guards the single CLI exit
chokepoint rather than one exit path:
raven.cli.commands.run(theconsole-script entry) runs the Typer app, and on the way out — when the lancedb
thread is live — flushes stdio + loguru and
os._exits past finalization;otherwise it exits normally. Every command that builds the loop is covered at
once, so this also subsumes and removes an ad-hoc per-command
os._exitthatalready existed for the
agentheadless one-shot. CliRunner invokes the Typerapp object directly and never reaches the wrapper, so in-process test hosts keep
normal exit semantics.
Verification: real PTY-driven
raven tui+ idle Ctrl+C and cancel-then-exitCtrl+C both exit 0 (were 139);
--help/--versionand other no-loop commandsstill exit normally; new subprocess regression test builds the real loop, asserts
the hazard gate fires, and confirms a clean guarded exit (with a companion test
that the crash reproduces under normal finalization, so the guard stays
load-bearing); the full in-process CliRunner TUI + CLI-smoke suites pass.
Type of change
Related issues (if there is)
N/A
Checklists
Development
Security
Code review