Skip to content

refactor: Raft 日志规范化 + 增量持久化 + 单节点快速选举#37

Open
NeverENG wants to merge 1 commit into
mainfrom
refactor/raft-improvements
Open

refactor: Raft 日志规范化 + 增量持久化 + 单节点快速选举#37
NeverENG wants to merge 1 commit into
mainfrom
refactor/raft-improvements

Conversation

@NeverENG
Copy link
Copy Markdown
Owner

@NeverENG NeverENG commented May 20, 2026

Summary

  • 所有 fmt.Printf 改为结构化 slog,统一日志格式
  • 新增 persistStateLocked() — 增量持久化 Term/votedFor(O(1)),不再每次全量写入
  • AppendEntry 改用 wal.AppendLog() 增量追加单条日志
  • 单节点模式:跳过投票等待,直接成为 Leader
  • getLastLogIndex() 空日志返回 -1,修复快照偏移边界条件
  • lastLogIndex 计算增加快照回退逻辑

Files

  • Raft/raft.go — 核心改动
  • Server/server.go — 移除冗余打印
  • service/fsm.go — 日志措辞微调

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Fixed handling of single-node clusters during election transitions.
    • Improved log index detection for edge cases with empty logs.
  • Chores

    • Migrated logging system to structured format for enhanced observability.
    • Optimized state persistence using incremental write operations.
    • Enhanced connection lifecycle management with explicit start/stop callbacks.

Review Change Stack

- 所有 fmt.Printf 改为结构化 slog
- 新增 persistStateLocked() 增量持久化 Term/votedFor,O(1)
- AppendEntry 改用 wal.AppendLog() 增量追加日志
- 单节点模式无需等待投票,直接成为 Leader
- getLastLogIndex() 空日志时返回 -1,修复快照偏移边界问题
- Server/FSM 日志文本同步微调

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

This PR systematically replaces fmt-based logging with structured slog throughout the Raft consensus module and FSM, introduces incremental state persistence to reduce full-state writes, adds single-node election handling, refines log index semantics when snapshots exist, and wires server router lifecycle callbacks.

Changes

Raft Logging, Persistence, and Election Updates

Layer / File(s) Summary
Structured Logging Migration
Raft/raft.go, service/fsm.go
Replace all fmt.Printf/Println with structured slog.Info, slog.Warn, slog.Error across Raft startup, election transitions, snapshot operations, and FSM Apply/Get handlers. Add log/slog imports to both modules.
Incremental State Persistence & Optimization
Raft/raft.go
Introduce new persistStateLocked() to persist only Term and votedFor via incremental WAL writes. Update election startup, AppendEntry log operations, and TakeSnapshot to use incremental persistence instead of full state writes.
Election Flow & Leader Initialization
Raft/raft.go
Add single-node election shortcut: node becomes leader immediately when no peers exist. Refine becomeLeader to properly initialize nextIndex and matchIndex using LastIncludedIndex as baseline when the in-memory log is empty.
Log Index Calculation & Entry Handling
Raft/raft.go
Update getLastLogIndex to return -1 when no in-memory log and no snapshot baseline exists. Adjust AppendEntry to compute previous log index by preferring in-memory log state, falling back to LastIncludedIndex only when it exceeds 0.

Server Initialization Wiring

Layer / File(s) Summary
Server Router Lifecycle Callbacks
Server/server.go
Register router connection lifecycle callbacks via SetConnStartFunc and SetConnStopFunc on the network server to invoke router handlers at client connection start and stop events.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • NeverENG/BanDB#33: Both PRs modify Raft snapshot index handling and LastIncludedIndex calculations in Raft/raft.go, with overlapping changes to log index semantics and TakeSnapshot behavior.

Poem

🐰 Logs now speak in structured prose,
State persists in careful rows,
Single nodes wear leader's crown,
Indices dance without frown!
Raft grows wise, connection-aware—
A cleaner consensus, with care! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the three main changes: structured logging standardization, incremental persistence optimization, and single-node fast election, all of which are core modifications present in Raft/raft.go.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/raft-improvements
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch refactor/raft-improvements

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Raft/raft.go`:
- Around line 126-131: The persistStateLocked helper currently only persists
Term/votedFor in some places; ensure every mutation of r.Term or r.votedFor
calls persistStateLocked while holding the raft lock so state is durable. Find
all code paths that assign r.Term or r.votedFor (e.g., the higher-term
reply/response handlers such as the functions that process
AppendEntries/RequestVote replies or any onReceiveHigherTerm/becomeFollower
helpers) and add a call to persistStateLocked() immediately after the assignment
(still under the same lock), propagating/logging any persistence error as the
existing SaveState path does. Ensure no code updates Term/votedFor without
invoking persistStateLocked under lock.
- Around line 206-213: The code mixes a new zero-based log-index convention with
old guards that treat LastIncludedIndex==0 as “no snapshot”, causing
relativeStart to be -1 in replicateLog() and skipping valid snapshot index 0;
update the logic around lastLogIndex/lastLogTerm and snapshot checks to
consistently treat LastIncludedIndex as the actual last included log index
(allowing 0), e.g. initialize lastLogIndex = int(r.LastIncludedIndex) when r.log
is empty, compute relativeStart as int(start) - int(r.LastIncludedIndex) (not
subtracting 1), and replace all `> 0` snapshot guards with `>= 0` or explicit
nil/empty checks so functions like replicateLog(), and code paths referencing
r.LastIncludedIndex/r.LastIncludedTerm/r.log use the same baseline across the
file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8c4ffd64-0104-4079-97a4-d89e39c6bf6e

📥 Commits

Reviewing files that changed from the base of the PR and between 33e4b8e and 6fd32d7.

📒 Files selected for processing (3)
  • Raft/raft.go
  • Server/server.go
  • service/fsm.go

Comment thread Raft/raft.go
Comment on lines +126 to +131
// persistStateLocked 仅持久化 Term 和 votedFor(增量持久化,O(1))
func (r *Raft) persistStateLocked() {
if err := r.wal.SaveState(int64(r.Term), int64(r.votedFor)); err != nil {
slog.Error("failed to persist state", "error", err)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Persist every Term/votedFor mutation, not just election start.

This helper is a good optimization, but the higher-term reply paths still update Term and votedFor without calling it. A crash after one of those downgrades can restart the node with a stale term/vote.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Raft/raft.go` around lines 126 - 131, The persistStateLocked helper currently
only persists Term/votedFor in some places; ensure every mutation of r.Term or
r.votedFor calls persistStateLocked while holding the raft lock so state is
durable. Find all code paths that assign r.Term or r.votedFor (e.g., the
higher-term reply/response handlers such as the functions that process
AppendEntries/RequestVote replies or any onReceiveHigherTerm/becomeFollower
helpers) and add a call to persistStateLocked() immediately after the assignment
(still under the same lock), propagating/logging any persistence error as the
existing SaveState path does. Ensure no code updates Term/votedFor without
invoking persistStateLocked under lock.

Comment thread Raft/raft.go
Comment on lines +206 to +213
lastLogIndex := -1
lastLogTerm := 0
if len(r.log) > 0 {
lastLogIndex = r.log[len(r.log)-1].Index
lastLogTerm = r.log[len(r.log)-1].Term
} else if r.LastIncludedIndex > 0 {
lastLogIndex = int(r.LastIncludedIndex)
lastLogTerm = int(r.LastIncludedTerm)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Keep the new zero-based index convention consistent across Raft.

After these changes, the first real log entry becomes index 0, but the rest of the file still assumes the first in-memory entry is LastIncludedIndex + 1 and that LastIncludedIndex == 0 means “no snapshot”. In a fresh multi-node cluster that makes replicateLog() compute relativeStart == -1 on the first append, and a snapshot taken at index 0 is ignored by the > 0 guards during later bootstrap/election flows. Please align on one baseline before merging.

Also applies to: 306-310, 524-527, 540-545

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Raft/raft.go` around lines 206 - 213, The code mixes a new zero-based
log-index convention with old guards that treat LastIncludedIndex==0 as “no
snapshot”, causing relativeStart to be -1 in replicateLog() and skipping valid
snapshot index 0; update the logic around lastLogIndex/lastLogTerm and snapshot
checks to consistently treat LastIncludedIndex as the actual last included log
index (allowing 0), e.g. initialize lastLogIndex = int(r.LastIncludedIndex) when
r.log is empty, compute relativeStart as int(start) - int(r.LastIncludedIndex)
(not subtracting 1), and replace all `> 0` snapshot guards with `>= 0` or
explicit nil/empty checks so functions like replicateLog(), and code paths
referencing r.LastIncludedIndex/r.LastIncludedTerm/r.log use the same baseline
across the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant