refactor(storage): MemTable 采用双表(active/dirty)模式消除 Flush 数据竞争 by NeverENG · Pull Request #24 · NeverENG/BanDB

NeverENG · 2026-05-11T16:28:01Z

抽取 SkipList 类型封装跳表状态(size/level/head)
MemTable 维护 active+dirty 两张表,类似 sync.Map 的 read/dirty 机制
Flush 时原子交换 active→dirty,创建新 active,I/O 在锁外执行不阻塞写入
Get 查找顺序: active → dirty(不可变快照) → SSTable
Engine 移除冗余锁,变为薄封装层,同步完全由 MemTable 负责
重命名 MAXL→maxLevel, P→probability(Go 命名规范)

Summary by CodeRabbit

Refactor
- Restructured storage engine to improve concurrency handling and memory management. The Engine now delegates directly to MemTable operations, while MemTable uses an optimized double-buffer design for more efficient memory handling during flush operations.

- 抽取 SkipList 类型封装跳表状态(size/level/head) - MemTable 维护 active+dirty 两张表,类似 sync.Map 的 read/dirty 机制 - Flush 时原子交换 active→dirty,创建新 active,I/O 在锁外执行不阻塞写入 - Get 查找顺序: active → dirty(不可变快照) → SSTable - Engine 移除冗余锁,变为薄封装层,同步完全由 MemTable 负责 - 重命名 MAXL→maxLevel, P→probability(Go 命名规范)

coderabbitai · 2026-05-11T16:28:14Z

📝 Walkthrough

Walkthrough

The PR refactors storage layer concurrency by moving synchronization from Engine to MemTable. Engine loses its RWMutex and becomes a thin delegating wrapper. MemTable gains a double-buffer design with active/dirty skip lists, skip-list operations migrated onto SkipList itself, and a rewritten Flush that swaps buffers and persists outside locks.

Changes

Storage Concurrency Refactor

Layer / File(s)	Summary
Data Model Architecture `storage/engine.go`, `storage/zstorage/memtable.go`	Engine struct removes `mu sync.RWMutex` field. SkipList gains `size`, `level`, `head` fields. MemTable gains `active`/`dirty` skip lists and `RWMutex`. Package-level `maxLevel` and `probability` variables introduced for random-level generation.
SkipList Operations `storage/zstorage/memtable.go`	`randomLevel()` updated to use `probability` and `maxLevel`. `SkipList.insert()` and `SkipList.delete()` implemented; insert manages size increment, delete manages node unlinking and level shrinking. `SkipList.search()` returns value and found flag.
MemTable Initialization `storage/zstorage/memtable.go`	`newSkipList()` factory creates empty skip lists. `NewMemTable()` initializes `active` with empty skip list, sets up WAL/SSTable channels, starts background workers, and triggers WAL recovery directly into `active`.
MemTable Read/Write Operations `storage/zstorage/memtable.go`	`Size()` reads `active.size` under lock. `Get()` searches `active` first, then `dirty` (if present), then SSTables. `Put()` writes WAL, inserts into `active`, triggers `Flush()` when `active.size` exceeds threshold. `Delete()` removes from `active` via `SkipList.delete()`.
Flush and Persistence `storage/zstorage/memtable.go`	`Flush()` swaps `active` → `dirty` under lock, allocates new `active`, writes `dirty` to SSTables outside lock, clears WAL, sets `dirty` to nil. `collectAllEntry()` becomes standalone helper taking `*SkipList`. `WriteSSTable()` captures `active` snapshot and persists entries. `resetMemTable()` removed.
Engine Delegation `storage/engine.go`	`Engine.Put()`, `Get()`, and `Delete()` now directly call `memTable` methods without acquiring locks or triggering flushes. Import dependencies reduced (removes `sync` and `config`).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hops through the storage refactor with glee,
Engine steps back—a thin wrapper, now free!
MemTable takes charge with buffers so bright,
Active and dirty keep reads and writes right. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main architectural change: refactoring MemTable to use a dual-buffer (active/dirty) pattern to eliminate flush data races, which aligns with the core changes removing synchronization from Engine and introducing concurrent-safe double-buffer design in MemTable.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/memtable-race

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

storage/zstorage/memtable.go (1)
400-411: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Data race: WriteSSTable iterates active without synchronization.

After releasing RLock, concurrent Put() calls can modify the same SkipList via insert() while collectAllEntry(active) is iterating over it. This can cause corrupted iteration or crashes.

Unlike Flush(), which correctly swaps active → dirty to create an immutable snapshot, this method reads from a live, mutable structure.
🔧 Proposed fix: use the same swap pattern as Flush
 func (m *MemTable) WriteSSTable() error {
 	m.mu.RLock()
-	active := m.active
+	entries := collectAllEntry(m.active)
 	m.mu.RUnlock()
 
-	err := m.sst.writeToSSTable(collectAllEntry(active))
+	err := m.sst.writeToSSTable(entries)
 	select {
 	case m.compactCh <- true:
 	default:
 	}
 	return err
 }
Alternatively, if you need a true snapshot without blocking writes, apply the same swap pattern used in Flush().
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@storage/zstorage/memtable.go` around lines 400 - 411, WriteSSTable currently
grabs and releases m.mu.RLock then calls collectAllEntry(active) which can race
with concurrent Put() mutations of the SkipList; change WriteSSTable to take the
same immutable-snapshot swap used by Flush: acquire m.mu.Lock, swap m.active
into a new local variable (e.g., dirty or snapshot) and replace m.active with an
empty SkipList, release the lock, then call collectAllEntry(snapshot) and
m.sst.writeToSSTable; keep the compactCh send logic unchanged so writers are not
blocked.

🧹 Nitpick comments (2)

storage/engine.go (1)

53-62: 💤 Low value

Errors from Put/Delete in applyWorker are silently dropped.

This is likely pre-existing behavior, but errors returned from e.Put() and e.Delete() are ignored, which could mask failures in the apply pipeline.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@storage/engine.go` around lines 53 - 62, The applyWorker currently ignores
return values from e.Put and e.Delete; update Engine.applyWorker to capture and
handle errors from those calls (e.g., err := e.Put(...)/e.Delete(...), then if
err != nil handle it). Handle errors by logging using Engine's logger (or return
them on a new engine-level channel such as e.applyErrCh so the caller can
observe/handle failures), and consider whether the worker should retry, stop
processing, or propagate the error; ensure you reference the applyCh consumer in
applyWorker and the Put/Delete calls when adding the error handling.

storage/zstorage/memtable.go (1)

334-338: ⚖️ Poor tradeoff

Consider retaining dirty reference on flush error for recovery.

If writeToSSTable fails, the method returns early without setting dirty = nil, which is correct for allowing Get to still find data. However, the data in dirty is effectively lost since it's not in the new active either. Consider whether to copy entries back to active or implement retry logic.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@storage/zstorage/memtable.go` around lines 334 - 338, The flush currently
returns on error from m.sst.writeToSSTable(allEntries) which leaves the staged
entries neither in the new SST nor in m.active; to avoid data loss, on write
error restore the entries into the in-memory store instead of dropping them:
when writeToSSTable returns an error, merge allEntries (or m.dirty) back into
m.active (or leave m.dirty intact) while holding the same mutex used by Get, so
lookups still succeed; alternatively implement a bounded retry loop around
m.sst.writeToSSTable before abandoning the merge; reference
m.sst.writeToSSTable, m.dirty, m.active, and Get to locate where to perform the
merge/ retry and ensure proper locking.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@storage/zstorage/memtable.go`:
- Around line 221-235: The Delete method in MemTable (MemTable.Delete) removes
keys from the active memtable via m.active.delete but does not persist a
tombstone to the write-ahead log (WAL), so deletes are lost on recovery; update
Delete to append a delete marker to the WAL before mutating the in-memory table
(mirror the Put flow that writes to WAL first), e.g. call the existing WAL write
method for deletions (or add one) and ensure the WAL entry is flushed/checked
for errors before calling m.active.delete; keep error handling consistent with
Put so a failed WAL write prevents the in-memory delete.

---

Outside diff comments:
In `@storage/zstorage/memtable.go`:
- Around line 400-411: WriteSSTable currently grabs and releases m.mu.RLock then
calls collectAllEntry(active) which can race with concurrent Put() mutations of
the SkipList; change WriteSSTable to take the same immutable-snapshot swap used
by Flush: acquire m.mu.Lock, swap m.active into a new local variable (e.g.,
dirty or snapshot) and replace m.active with an empty SkipList, release the
lock, then call collectAllEntry(snapshot) and m.sst.writeToSSTable; keep the
compactCh send logic unchanged so writers are not blocked.

---

Nitpick comments:
In `@storage/engine.go`:
- Around line 53-62: The applyWorker currently ignores return values from e.Put
and e.Delete; update Engine.applyWorker to capture and handle errors from those
calls (e.g., err := e.Put(...)/e.Delete(...), then if err != nil handle it).
Handle errors by logging using Engine's logger (or return them on a new
engine-level channel such as e.applyErrCh so the caller can observe/handle
failures), and consider whether the worker should retry, stop processing, or
propagate the error; ensure you reference the applyCh consumer in applyWorker
and the Put/Delete calls when adding the error handling.

In `@storage/zstorage/memtable.go`:
- Around line 334-338: The flush currently returns on error from
m.sst.writeToSSTable(allEntries) which leaves the staged entries neither in the
new SST nor in m.active; to avoid data loss, on write error restore the entries
into the in-memory store instead of dropping them: when writeToSSTable returns
an error, merge allEntries (or m.dirty) back into m.active (or leave m.dirty
intact) while holding the same mutex used by Get, so lookups still succeed;
alternatively implement a bounded retry loop around m.sst.writeToSSTable before
abandoning the merge; reference m.sst.writeToSSTable, m.dirty, m.active, and Get
to locate where to perform the merge/ retry and ensure proper locking.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9c2cd3dd-9406-4296-9da7-5b5387df5ed2

📥 Commits

Reviewing files that changed from the base of the PR and between eee4c15 and 1e6f8d3.

📒 Files selected for processing (2)

storage/engine.go
storage/zstorage/memtable.go

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread storage/zstorage/memtable.go

Merge branch 'main' into fix/memtable-race

59eba2e

NeverENG merged commit ad4754e into main May 11, 2026

NeverENG deleted the fix/memtable-race branch May 11, 2026 16:48

NeverENG restored the fix/memtable-race branch May 11, 2026 16:48

NeverENG deleted the fix/memtable-race branch May 11, 2026 16:48

coderabbitai Bot mentioned this pull request May 20, 2026

docs: 添加 CLAUDE.md 协作规范 + CI 集成测试 #34

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(storage): MemTable 采用双表(active/dirty)模式消除 Flush 数据竞争#24

refactor(storage): MemTable 采用双表(active/dirty)模式消除 Flush 数据竞争#24
NeverENG merged 2 commits into
mainfrom
fix/memtable-race

NeverENG commented May 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NeverENG commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NeverENG commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading