fix(cli): hard-exit past finalization to avoid lancedb segfault (139) by JasonJarvan · Pull Request #82 · EverMind-AI/Raven

JasonJarvan · 2026-07-02T08:25:31Z

Change description

raven tui segfaulted on exit (exit code 139 / SIGSEGV) after Ctrl+C, failing
expect_exit(0) for the ENTIRE TUI e2e suite (test_e2e_ctrl_c both variants,
test_e2e_tui_status_slash) and gating live-green for all TUI e2e.

Root cause (verified — not a build artifact; reproduces on a clean build and
with no node child / no TTY): any command that builds the agent loop opens a
lancedb-backed store, and lancedb starts a process-global Rust/tokio background
thread (LanceDBBackgroundEventLoop) that is a daemon thread with no public
shutdown hook. A normal CPython interpreter finalization tears down while that
native runtime is still live and segfaults. The crash is 100% in Py_FinalizeEx
(both "loop built" and "reached end of main" print before the fault; faulthandler
shows <no Python frame> with the pyarrow/numpy lance stack loaded). It is
independent of the long-term memory backend (reproduces with memory.backend
disabled too).

Because lancedb has no clean shutdown, the fix guards the single CLI exit
chokepoint rather than one exit path: raven.cli.commands.run (the
console-script entry) runs the Typer app, and on the way out — when the lancedb
thread is live — flushes stdio + loguru and os._exits past finalization;
otherwise it exits normally. Every command that builds the loop is covered at
once, so this also subsumes and removes an ad-hoc per-command os._exit that
already existed for the agent headless one-shot. CliRunner invokes the Typer
app object directly and never reaches the wrapper, so in-process test hosts keep
normal exit semantics.

Verification: real PTY-driven raven tui + idle Ctrl+C and cancel-then-exit
Ctrl+C both exit 0 (were 139); --help / --version and other no-loop commands
still exit normally; new subprocess regression test builds the real loop, asserts
the hazard gate fires, and confirms a clean guarded exit (with a companion test
that the crash reproduces under normal finalization, so the guard stays
load-bearing); the full in-process CliRunner TUI + CLI-smoke suites pass.

Type of change

Bug fix
New feature
Document
Others

Related issues (if there is)

N/A

Checklists

Development

Lint rules pass locally
Application changes have been tested thoroughly
Automated tests covering modified code pass

Security

Security impact of change has been considered
Code follows security best practices and guidelines

Code review

Pull request has a descriptive title and context useful to a reviewer. Screenshots or screencasts are attached as necessary

Any raven command that builds the agent loop opens a lancedb-backed store; lancedb starts a process-global Rust/tokio background thread (LanceDBBackgroundEventLoop) with no shutdown hook. Normal CPython interpreter finalization races that live native runtime and segfaults (exit 139), masking the command's real exit code and failing expect_exit(0) for the whole TUI e2e suite. Guard once at the console-script entry (raven.cli.commands.run): when the lancedb thread is live, flush stdio + loguru and os._exit past finalization; otherwise exit normally. CliRunner invokes the Typer app directly and never reaches this wrapper, so in-process test hosts keep normal exit semantics. Subsumes and removes the ad-hoc per-command os._exit in agent_commands. Add a subprocess regression test that builds the real agent loop, asserts the hazard gate fires, and checks a clean exit via the guarded path. Co-authored-by: Claude (claude-opus-4-8) <[email protected]>

The pre-commit ruff-format hook collapses two multi-line assert messages that fit within the 120-col limit. Formatting only; no behavior change. Co-authored-by: Claude (claude-opus-4-8) <[email protected]>

## Summary The commit-message lint rejects any `Merge ...` commit in the PR range (`scripts/commit_lint.py`: "merge commits are not allowed in PR ranges"). But this repository is squash-only (rebase/merge disabled), so a merge commit created by syncing `main` into a feature branch is squashed away at merge time and never reaches `main`. The rule therefore guards nothing while failing the CI check for every PR that synced `main` via a merge commit (observed on #82). The fix makes `scripts/check_commit_messages.py` gather only non-merge commits by passing `--no-merges` to its `git log` call, mirroring commitlint's own default of ignoring merge commits. The `Merge ` rule in `commit_lint.py` is left intact as belt-and-suspenders (it is also reused by the PR-title path), so no lint behavior is removed. ## Type - [ ] Fix - [ ] Feature - [ ] Docs - [x] CI / tooling - [ ] Refactor - [ ] Other ## Verification - `uv run pytest tests/test_check_commit_messages.py tests/test_commit_lint.py -q` -> 10 passed - `uv run ruff check scripts/check_commit_messages.py tests/test_check_commit_messages.py` -> All checks passed - Added TDD coverage in `tests/test_check_commit_messages.py`: builds a real temporary repo with a `--no-ff` merge commit and asserts the range passes (red before the fix, green after). - [x] Relevant tests pass locally - [x] Relevant lint / type checks pass locally - [ ] User-facing docs or screenshots are updated when needed ## Risk - [x] Security impact considered - [x] Backward compatibility considered - [x] Rollback path is clear for risky changes Backward compatible: real commit messages are still linted exactly as before; only merge commits are now skipped. Rollback is a one-line revert of the `--no-merges` flag. ## Related Issues N/A Co-authored-by: Claude (claude-opus-4-8) <[email protected]> Co-authored-by: Sheng Zhao <[email protected]> Co-authored-by: Claude (claude-opus-4-8) <[email protected]>

JasonJarvan force-pushed the fix/tui_exit_sigsegv branch from 190ab8c to 5178062 Compare July 2, 2026 13:52

JasonJarvan changed the title ~~fix(cli): hard-exit raven tui to avoid lancedb finalization segfault~~ fix(cli): hard-exit past finalization to avoid lancedb segfault (139) Jul 2, 2026

JasonJarvan requested a review from arelchan July 2, 2026 14:06

arelchan previously approved these changes Jul 2, 2026

View reviewed changes

JasonJarvan dismissed arelchan’s stale review via 984cabe July 2, 2026 14:09

JasonJarvan requested a review from arelchan July 3, 2026 06:14

JasonJarvan force-pushed the fix/tui_exit_sigsegv branch from d62ff9b to 25037c1 Compare July 3, 2026 06:18

JasonJarvan requested a review from vichua213-crypto July 3, 2026 07:50

Sheng Zhao and others added 2 commits July 3, 2026 07:54

chore: apply ruff format to tui exit regression test

90fde8b

The pre-commit ruff-format hook collapses two multi-line assert messages that fit within the 120-col limit. Formatting only; no behavior change. Co-authored-by: Claude (claude-opus-4-8) <[email protected]>

JasonJarvan force-pushed the fix/tui_exit_sigsegv branch from e2437db to 90fde8b Compare July 3, 2026 07:56

cyfyifanchen merged commit f827c00 into main Jul 3, 2026
7 checks passed

cyfyifanchen deleted the fix/tui_exit_sigsegv branch July 3, 2026 07:57

JasonJarvan mentioned this pull request Jul 3, 2026

ci: skip merge commits in commit-message lint #90

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cli): hard-exit past finalization to avoid lancedb segfault (139)#82

fix(cli): hard-exit past finalization to avoid lancedb segfault (139)#82
cyfyifanchen merged 2 commits into
mainfrom
fix/tui_exit_sigsegv

JasonJarvan commented Jul 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

JasonJarvan commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change description

Type of change

Related issues (if there is)

Checklists

Development

Security

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JasonJarvan commented Jul 2, 2026 •

edited

Loading