Skip to content

fix: blockcache compute problem#3203

Open
Z-G-H1 wants to merge 5 commits intoOpenAtomFoundation:3.5from
Z-G-H1:fix-blockcache
Open

fix: blockcache compute problem#3203
Z-G-H1 wants to merge 5 commits intoOpenAtomFoundation:3.5from
Z-G-H1:fix-blockcache

Conversation

@Z-G-H1
Copy link
Copy Markdown
Collaborator

@Z-G-H1 Z-G-H1 commented Dec 7, 2025

#3168
share-block-cache 为yes时,多个 Column Family 共享同一个 block cache,但在统计 block cache usage 时,代码仍然对每个 CF 分别查询然后累加,导致同一个 block cache 的使用量被计算了多次。
修复前,hash,list,set 的计算均为两倍,zset为三倍:
image
修复后这些数据结构的不会再被计算多次:
image
添加测试:
image

Copilot AI review requested due to automatic review settings December 7, 2025 12:42
@github-actions github-actions Bot added the ☢️ Bug Something isn't working label Dec 7, 2025
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a block cache metrics double-counting issue that occurred when share-block-cache is enabled. When multiple Column Families share the same RocksDB block cache, the code was previously aggregating metrics from each CF separately, causing the same cache usage to be counted multiple times (2x for hash/list/set/streams, 3x for zset).

Key Changes

  • Added share_block_cache_ member variable to track shared cache configuration
  • Modified GetProperty() methods to return early after querying the first CF when shared cache is enabled for block cache metrics
  • Updated GetRocksDBInfo() to query only the first CF handle for block cache properties when cache is shared
  • Added integration test to verify block cache metrics are reported correctly

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/storage/src/redis.h Added share_block_cache_ boolean member variable to track shared cache configuration
src/storage/src/redis.cc Modified GetRocksDBInfo() to avoid double-counting by checking first CF only when cache is shared
src/storage/src/redis_hashes.cc Set share_block_cache_ in Open() and added early return in GetProperty() for shared cache properties
src/storage/src/redis_lists.cc Set share_block_cache_ in Open() and added early return in GetProperty() for shared cache properties
src/storage/src/redis_sets.cc Set share_block_cache_ in Open() and added early return in GetProperty() for shared cache properties
src/storage/src/redis_zsets.cc Set share_block_cache_ in Open() and added early return in GetProperty() for shared cache properties (3 CFs)
src/storage/src/redis_streams.cc Set share_block_cache_ in Open() and added early return in GetProperty() for shared cache properties
tests/integration/server_test.go Added integration test to verify block cache capacity metrics are present in INFO output

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +795 to +804
// fix: verify that shared block cache does not cause double counting of block cache metrics
It("should report correct block cache capacity when share-block-cache is yes", func() {
// Verify share-block-cache is enabled
shareBlockCacheConfig := client.ConfigGet(ctx, "share-block-cache")
Expect(shareBlockCacheConfig.Err()).NotTo(HaveOccurred())
Expect(shareBlockCacheConfig.Val()).To(Equal(map[string]string{"share-block-cache": "yes"}))

info := client.Info(ctx, "rocksdb")
Expect(info.Err()).NotTo(HaveOccurred())
infoStr := info.Val()
Copy link

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation: This test uses spaces for indentation (lines 795-805), while the rest of the file uses tabs. This creates mixed indentation that can cause issues in Go code. Please use tabs consistently with the rest of the file.

Suggested change
// fix: verify that shared block cache does not cause double counting of block cache metrics
It("should report correct block cache capacity when share-block-cache is yes", func() {
// Verify share-block-cache is enabled
shareBlockCacheConfig := client.ConfigGet(ctx, "share-block-cache")
Expect(shareBlockCacheConfig.Err()).NotTo(HaveOccurred())
Expect(shareBlockCacheConfig.Val()).To(Equal(map[string]string{"share-block-cache": "yes"}))
info := client.Info(ctx, "rocksdb")
Expect(info.Err()).NotTo(HaveOccurred())
infoStr := info.Val()
// fix: verify that shared block cache does not cause double counting of block cache metrics
It("should report correct block cache capacity when share-block-cache is yes", func() {
// Verify share-block-cache is enabled
shareBlockCacheConfig := client.ConfigGet(ctx, "share-block-cache")
Expect(shareBlockCacheConfig.Err()).NotTo(HaveOccurred())
Expect(shareBlockCacheConfig.Val()).To(Equal(map[string]string{"share-block-cache": "yes"}))
info := client.Info(ctx, "rocksdb")
Expect(info.Err()).NotTo(HaveOccurred())
infoStr := info.Val()

Copilot uses AI. Check for mistakes.

Expect(infoStr).To(ContainSubstring("strings_block_cache_capacity:"))
Expect(infoStr).To(ContainSubstring("hashes_block_cache_capacity:"))
Expect(infoStr).To(ContainSubstring("lists_block_cache_capacity:"))
Copy link

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace detected at the end of this line. Please remove it.

Suggested change
Expect(infoStr).To(ContainSubstring("lists_block_cache_capacity:"))
Expect(infoStr).To(ContainSubstring("lists_block_cache_capacity:"))

Copilot uses AI. Check for mistakes.
Expect(infoStr).To(ContainSubstring("strings_block_cache_capacity:"))
Expect(infoStr).To(ContainSubstring("hashes_block_cache_capacity:"))
Expect(infoStr).To(ContainSubstring("lists_block_cache_capacity:"))
Expect(infoStr).To(ContainSubstring("sets_block_cache_capacity:"))
Copy link

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace detected at the end of this line. Please remove it.

Suggested change
Expect(infoStr).To(ContainSubstring("sets_block_cache_capacity:"))
Expect(infoStr).To(ContainSubstring("sets_block_cache_capacity:"))

Copilot uses AI. Check for mistakes.
Comment on lines +796 to +818
It("should report correct block cache capacity when share-block-cache is yes", func() {
// Verify share-block-cache is enabled
shareBlockCacheConfig := client.ConfigGet(ctx, "share-block-cache")
Expect(shareBlockCacheConfig.Err()).NotTo(HaveOccurred())
Expect(shareBlockCacheConfig.Val()).To(Equal(map[string]string{"share-block-cache": "yes"}))

info := client.Info(ctx, "rocksdb")
Expect(info.Err()).NotTo(HaveOccurred())
infoStr := info.Val()

Expect(infoStr).To(ContainSubstring("strings_block_cache_capacity:"))
Expect(infoStr).To(ContainSubstring("hashes_block_cache_capacity:"))
Expect(infoStr).To(ContainSubstring("lists_block_cache_capacity:"))
Expect(infoStr).To(ContainSubstring("sets_block_cache_capacity:"))
Expect(infoStr).To(ContainSubstring("zsets_block_cache_capacity:"))

// Simple string-based verification that values are present
Expect(infoStr).To(MatchRegexp(`strings_block_cache_capacity:\d+`))
Expect(infoStr).To(MatchRegexp(`hashes_block_cache_capacity:\d+`))
Expect(infoStr).To(MatchRegexp(`lists_block_cache_capacity:\d+`))
Expect(infoStr).To(MatchRegexp(`sets_block_cache_capacity:\d+`))
Expect(infoStr).To(MatchRegexp(`zsets_block_cache_capacity:\d+`))
})
Copy link

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test only verifies that block cache metrics are present in the output, but doesn't actually verify that the double-counting bug has been fixed. According to the PR description, the fix prevents metrics from being counted multiple times (2x for hash/list/set, 3x for zset).

Consider enhancing the test to:

  1. Parse the actual capacity values from the INFO output for each data type
  2. Verify that they all report the same capacity value (since they share the cache)
  3. Optionally, verify that this shared value matches the configured block cache size from the config

Example approach:

// Parse capacity values
stringsCapacity := parseCapacity(infoStr, "strings_block_cache_capacity")
hashesCapacity := parseCapacity(infoStr, "hashes_block_cache_capacity")
// ... parse others

// Verify all report the same value
Expect(stringsCapacity).To(Equal(hashesCapacity))
Expect(stringsCapacity).To(Equal(listsCapacity))
// ... compare others

This would provide stronger validation that the fix actually prevents double-counting.

Copilot uses AI. Check for mistakes.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 7, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Collaborator

@Mixficsol Mixficsol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修复 CI 的部分去掉吧,一个 PR 做一件事

@Z-G-H1
Copy link
Copy Markdown
Collaborator Author

Z-G-H1 commented Dec 16, 2025

修复 CI 的部分去掉吧,一个 PR 做一件事

Done.

@Issues-translate-bot
Copy link
Copy Markdown

Bot detected the issue body's language is not English, translate it automatically.


Get rid of the part about fixing CI, one PR does one thing

Done.

@Z-G-H1 Z-G-H1 closed this Dec 16, 2025
@Z-G-H1 Z-G-H1 reopened this Dec 16, 2025
@Z-G-H1 Z-G-H1 force-pushed the fix-blockcache branch 2 times, most recently from d27e1c3 to e284374 Compare December 16, 2025 10:24
@wangshao1
Copy link
Copy Markdown
Collaborator

share_block_cache如果为true的话,应该不止是Columnfamily共用同一个blockcache。strings,hashes等所有rocksdb实例的block_cache都用的是同一个。所以你可以再考虑下不同rocksdb实例之间重复计算的问题,如果不太好改,可以现在代码里加个todo。

@Issues-translate-bot
Copy link
Copy Markdown

Bot detected the issue body's language is not English, translate it automatically.


If share_block_cache is true, more than just Columnfamily should share the same blockcache. The block_cache of all rocksdb instances such as strings, hashes, etc. use the same one. So you can consider the problem of repeated calculations between different rocksdb instances. If it is not easy to change, you can add a todo to the code now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants