fix: startup exception being silently swallowed when journal replay fails#4815
Open
nodece wants to merge 2 commits into
Open
fix: startup exception being silently swallowed when journal replay fails#4815nodece wants to merge 2 commits into
nodece wants to merge 2 commits into
Conversation
…ails Signed-off-by: Zixuan Liu <[email protected]>
Contributor
|
The fix looks reasonable to me. Could you please add tests to cover the startup failure path and verify that the expected non-OK exit code is preserved? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When
BookieImpl.start()fails during journal replay (e.g. missing recovery log), the exception is silently swallowed and the bookie appears to be running despite being in a broken, partially-initialized state.Root cause: Two issues conspire to hide the failure:
BookieServer.start()detects!bookie.isRunning()but silentlyreturns instead of throwing an exception. The method declaresthrows IOExceptionbut never uses it on the failure path.BookieImpl.shutdown()skips settingthis.exitCodewhenisRunning()isfalse(which is always the case beforeinitState()is called at line 756). SogetExitCode()returnsExitCode.OK(0) even on failure.Effect: The exception is caught by
AbstractLifecycleComponent.start()and dispatched to theuncaughtExceptionHandler(set byComponentStarter), which triggersComponentShutdownHookfor cleanup. However, without the IOException being thrown fromBookieServer.start(), the exception path relies entirely on the handler — and the exit code was incorrectlyOK(0).Changes
BookieServer.start():return→throw new IOException("Bookie failed to start, exit code: " + exitCode)— ensures the exception properly propagates to the lifecycle framework's exception handler.BookieImpl.shutdown(): Movethis.exitCode = exitCodeoutside theif (isRunning())guard — ensures the exit code is always set, even during early startup beforeinitState().Complementary to #4740 which fixed the root cause of journal file deletion (lastMark regression). This PR fixes the consequence: if a journal file is missing for any reason, the bookie should fail fast with a proper error.