Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions tools/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
add_subdirectory(./aof_to_pika)
add_subdirectory(./benchmark_client)
add_subdirectory(./bigkey_analyzer)
add_subdirectory(./binlog_sender)
add_subdirectory(./manifest_generator)
add_subdirectory(./rdb_to_pika)
Expand Down
29 changes: 29 additions & 0 deletions tools/bigkey_analyzer/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
set(WARNING_FLAGS "-W -Wextra -Wall -Wsign-compare \
-Wno-unused-parameter -Wno-redundant-decls -Wwrite-strings \
-Wpointer-arith -Wreorder -Wswitch -Wsign-promo \
-Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers")

set(CXXFLAGS "${WARNING_FLAGS} -std=c++17 -g")

set(SRC_DIR .)
aux_source_directory(${SRC_DIR} BASE_OBJS)

add_executable(bigkey_analyzer ${BASE_OBJS})

target_include_directories(bigkey_analyzer
PRIVATE
${PROJECT_SOURCE_DIR}
${PROJECT_SOURCE_DIR}/src
${PROJECT_SOURCE_DIR}/src/storage/include
)

target_link_libraries(bigkey_analyzer
storage
pthread
)
Comment on lines +20 to +23
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check RocksDB linking in storage library and other tools

echo "=== Check storage library's RocksDB dependency linking ==="
find . -path ./tools -prune -o -name "CMakeLists.txt" -type f -print | grep -E "src/storage|storage" | while read f; do
  if [ -f "$f" ]; then
    echo "File: $f"
    cat -n "$f" | grep -A 10 -B 2 "target_link_libraries"
  fi
done

echo -e "\n=== Check CMakeLists.txt files in tools directory ==="
find tools -name "CMakeLists.txt" -type f | while read f; do
  echo -e "\n--- $f ---"
  cat -n "$f" | grep -A 8 -B 1 "target_link_libraries"
done

Repository: OpenAtomFoundation/pikiwidb

Length of output: 10408


Add explicit ${ROCKSDB_LIBRARY} link for consistency with other tools.

Although storage provides ${ROCKSDB_LIBRARY} as a PUBLIC dependency, all similar tools in the project—including txt_to_pika, pika_to_txt, benchmark_client, and pika_port—explicitly link ${ROCKSDB_LIBRARY} alongside storage. Update to:

target_link_libraries(bigkey_analyzer 
    storage
    ${ROCKSDB_LIBRARY}
    pthread
)
🤖 Prompt for AI Agents
In tools/bigkey_analyzer/CMakeLists.txt around lines 20 to 23,
target_link_libraries only lists storage and pthread but other tools explicitly
link ${ROCKSDB_LIBRARY}; add ${ROCKSDB_LIBRARY} to the target_link_libraries
call so it becomes: target_link_libraries(bigkey_analyzer storage
${ROCKSDB_LIBRARY} pthread) to match project consistency.


set_target_properties(bigkey_analyzer PROPERTIES
RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}
CMAKE_COMPILER_IS_GNUCXX TRUE
COMPILE_FLAGS ${CXXFLAGS}
)
Comment on lines +25 to +29
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Replace deprecated COMPILE_FLAGS and remove incorrect CMAKE_COMPILER_IS_GNUCXX property.

Two issues:

  1. CMAKE_COMPILER_IS_GNUCXX is a CMake-detected variable, not a settable target property. Setting it here has no effect.
  2. COMPILE_FLAGS property has been deprecated since CMake 3.0 in favor of target_compile_options().
🔎 Recommended fix
+target_compile_options(bigkey_analyzer PRIVATE ${WARNING_FLAGS})
+target_compile_features(bigkey_analyzer PRIVATE cxx_std_17)
+
 set_target_properties(bigkey_analyzer PROPERTIES
     RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}
-    CMAKE_COMPILER_IS_GNUCXX TRUE
-    COMPILE_FLAGS ${CXXFLAGS}
 )

Also remove or simplify the CXXFLAGS variable at line 6 since flags are now applied via target_compile_options and C++17 via target_compile_features.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tools/bigkey_analyzer/CMakeLists.txt around lines 25-29, the target
properties block incorrectly sets CMAKE_COMPILER_IS_GNUCXX (a CMake-detected
variable that should not be set) and uses the deprecated COMPILE_FLAGS property;
remove the CMAKE_COMPILER_IS_GNUCXX line entirely, drop the COMPILE_FLAGS
property, and instead apply any desired flags with
target_compile_options(bigkey_analyzer PRIVATE ${CXXFLAGS}) and declare required
C++ standard via target_compile_features(bigkey_analyzer PRIVATE cxx_std_17);
also remove or simplify the CXXFLAGS variable at line 6 since flags are now
applied via target_compile_options and C++17 via target_compile_features.

124 changes: 124 additions & 0 deletions tools/bigkey_analyzer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Big Key Analyzer

大key分析工具,用于分析PikiwiDB实例中的大key情况。本工具适用于unstable分支新的存储结构,支持多种目录结构:
- 单实例 RocksDB
- 多DB实例 (db/0, db/1, db/2...)
- 直接分区目录 (0/, 1/, 2/...)
- **新增**: dbN/M 三层嵌套结构 (db0/0, db0/1, db1/0...)

## 功能特点

- 支持分析各种数据类型(strings, hashes, lists, sets, zsets)的大key
- 可以按大小过滤key
- 可以限制输出结果数量(top N)
- 支持按key前缀统计
- 输出结果包含key类型、大小和过期时间(TTL)
- 可以将结果输出到文件

## 编译

在PikiwiDB根目录下执行:

```bash
mkdir -p build
cd build
cmake ..
make bigkey_analyzer
```

编译完成后,可执行文件会生成在build目录下。

## 使用方法

```
Usage: bigkey_analyzer [OPTIONS] <db_path>
Options:
--min-size=SIZE Only show keys larger than SIZE bytes
--top=N Only show top N largest keys
--prefix-stat Show statistics by key prefix
--prefix-delimiter=C Character used to delimit prefix (default: ':')
--type=TYPE Only analyze specific type (strings|hashes|lists|sets|zsets|all)
--output=FILE Write output to file instead of stdout
--help Display this help message
```
Comment on lines +33 to +43
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block.

The usage text block should specify a language identifier for proper syntax highlighting and better readability.

🔎 Proposed fix
-```
+```text
 Usage: bigkey_analyzer [OPTIONS] <db_path>
 Options:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
Usage: bigkey_analyzer [OPTIONS] <db_path>
Options:
--min-size=SIZE Only show keys larger than SIZE bytes
--top=N Only show top N largest keys
--prefix-stat Show statistics by key prefix
--prefix-delimiter=C Character used to delimit prefix (default: ':')
--type=TYPE Only analyze specific type (strings|hashes|lists|sets|zsets|all)
--output=FILE Write output to file instead of stdout
--help Display this help message
```
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

29-29: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In tools/bigkey_analyzer/README.md around lines 29 to 39, the fenced code block
with the usage text lacks a language identifier; update the opening fence from
``` to ```text so it becomes ```text and leave the block content unchanged,
ensuring the usage snippet is syntax-highlighted as plain text in renders.


## 示例

1. 分析所有大key:

```bash
# 单实例
./bigkey_analyzer /path/to/pikiwidb/data

# 多DB实例(db/0, db/1, db/2...)
./bigkey_analyzer /path/to/pikiwidb
```

2. 只分析大于1MB的key:

```bash
./bigkey_analyzer --min-size=1048576 /path/to/pikiwidb/data
```

3. 只显示前10个最大的key:

```bash
./bigkey_analyzer --top=10 /path/to/pikiwidb/data
```

4. 只分析hash类型的key:

```bash
./bigkey_analyzer --type=hashes /path/to/pikiwidb/data
```

5. 分析并按前缀统计:

```bash
./bigkey_analyzer --prefix-stat /path/to/pikiwidb/data
```

6. 输出结果到文件:

```bash
./bigkey_analyzer --output=result.txt /path/to/pikiwidb/data
```

## 输出格式

工具输出包括三部分:

1. 大key列表 - 按大小降序排列
2. 按前缀统计(如果使用--prefix-stat选项)
3. 总结统计信息

示例输出:

```
===== Big Key Analysis =====
DB Partition Type Size Key TTL
db0 1 hash 1048576 user:profile:1001 -1
db0 2 zset 524288 ranking:global 3600
db1 0 string 262144 config:settings -1
...

===== Key Prefix Statistics =====
Prefix Count Total Size Avg Size
user 100 10485760 104857.6
ranking 50 2621440 52428.8
config 10 524288 52428.8
...

===== Summary =====
Total keys analyzed: 160
Keys by type:
hash: 50 keys, 25.0 MB total, 524288.0 bytes avg
zset: 30 keys, 15.0 MB total, 524288.0 bytes avg
string: 80 keys, 10.0 MB total, 131072.0 bytes avg
```
Comment on lines +97 to +118
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block.

The example output block should specify a language identifier for proper syntax highlighting and better readability.

🔎 Proposed fix
-```
+```text
 ===== Big Key Analysis =====
 Type    Size    Key     TTL
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

93-93: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In tools/bigkey_analyzer/README.md around lines 93 to 114, the fenced code block
lacks a language identifier so it doesn’t get proper syntax highlighting; update
the opening fence to include an appropriate language token (for example "text")
— i.e. change ``` to ```text — and ensure the closing fence remains unchanged so
the block renders with the specified language.


## 注意事项

- 工具只读取数据库,不会进行任何写操作
- 大key的大小包括key和value的总大小
- 已过期的key不会被包含在分析结果中
Loading
Loading