fix: fail fast when postage block is ahead of chain tip#5460
Open
martinconic wants to merge 2 commits into
Open
fix: fail fast when postage block is ahead of chain tip#5460martinconic wants to merge 2 commits into
martinconic wants to merge 2 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Checklist
Description
chainstate.Blockis ahead ofthe block number reported by
blockchain-rpc-endpoint, with aclear error pointing at the RPC misconfiguration.
lightnode-shutdown / fullnode-init-failure with an immediate, actionable
failure.
Why
The reporter on #4941 observed
/stampsreturning503 syncing in progressindefinitely, with/chainstateshowingblock~1.18M blocksahead of
chainTip. The two symptoms are one bug: oncechainstate.Block > chainTip, the postage listener loop inpkg/postage/listener/listener.goevaluatesto < fromandcontinuespkg/postage/listener/listener.goevaluatesto < fromandcontinuessyncStatusFnnever returnsdone=trueand/stampsstays in 503.After
postageSyncingStallingTimeout(10 min) the loop exits withErrPostageSyncingStalled; for lightnodes this triggersb.syncingStopped.Signal()and the node shuts down, for fullnodesinit fails.
chainstate.Blockis only ever advanced byUpdateBlockNumberfrom events the listener received from the RPC. So a stored block
ahead of the current chain tip means the configured RPC is now
serving a different chain than it was on a previous run — a
misrouted public endpoint, a changed
blockchain-rpc-endpoint, aload-balancer pointing to the wrong backend. The chain-ID check at
startup (
pkg/node/chain.go:109) does not catch this if the wrongbackend happens to report the configured chain ID. This is an RPC /
operational problem, not local DB corruption, and not something Bee
should auto-heal — silent rebuild would mask the misconfiguration and
trigger long resyncs on every restart.
Change
Before
batchSvc.Startruns, querychainBackend.BlockNumber(ctx)once. If the stored
chainstate.Blockis strictly greater, return anerror naming both block numbers and explaining the likely cause and
the recovery path (verify the RPC, then
--resync). If theBlockNumbercall itself fails, log a warning and continue — we don'twant to block startup on a transient RPC hiccup.
No tolerance is applied: the listener always writes
cs.Block <= blockNumber - tailSize, so under normal operationcs.Blockisstrictly below the live tip. The only way the check can trip is the
corruption scenario above.
Open API Spec Version Changes (if applicable)
Motivation and Context (Optional)
Related Issue (Optional)
Screenshots (if appropriate):
AI Disclosure