Skip to content

feat:progressive compact#3243

Open
chenbt-hz wants to merge 11 commits intoOpenAtomFoundation:unstablefrom
chenbt-hz:test-incremental-compact
Open

feat:progressive compact#3243
chenbt-hz wants to merge 11 commits intoOpenAtomFoundation:unstablefrom
chenbt-hz:test-incremental-compact

Conversation

@chenbt-hz
Copy link
Copy Markdown
Collaborator

@chenbt-hz chenbt-hz commented Apr 17, 2026

1.修复obd-compact的错误
2.新增测试的渐进式compact

fix:#3230
about:#3226
about:#3240
image

Summary by CodeRabbit

Release Notes

New Features

  • Added incremental-compact as a new compaction strategy option alongside existing full-compact and obd-compact modes
  • Introduced five new configuration parameters to fine-tune incremental compaction: incremental-compact-interval, incremental-compact-max-files, incremental-compact-max-time-ms, incremental-compact-min-rate, and incremental-compact-min-file-age

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

Introduces a new incremental-compact compaction strategy with configuration parameters, server-level scheduling, and storage-layer implementation. Updates the configuration system, database APIs, and auto-compaction routines to support time-bounded incremental compaction with rate and age-based file filtering.

Changes

Cohort / File(s) Summary
Configuration system setup
.gitignore, CMakeLists.txt, conf/pika.conf, include/pika_conf.h, src/pika_conf.cc, src/pika_admin.cc
Adds incremental-compact strategy configuration with five new parameters (interval, max-files, max-time-ms, min-rate, min-file-age). Extends config loading/rewriting and adds CONFIG GET/SET handlers for the new parameters with validation.
Database and server integration
include/pika_db.h, src/pika_db.cc, include/pika_server.h, src/pika_server.cc
Exposes DB::IncrementalCompact() method, adds kIncrementalCompact task type, tracks last compaction time, and integrates rate-limited incremental compaction scheduling into AutoCompactRange().
Storage layer APIs
src/storage/include/storage/storage.h, src/storage/src/redis.h
Declares Storage::IncrementalCompact() and Redis::IncrementalCompact() with configurable parameters for file limits, time budgets, rate thresholds, and minimum file age filtering.
Core incremental compaction logic
src/storage/src/redis.cc, src/storage/src/storage.cc
Implements incremental compaction algorithm that selects the oldest eligible SST file per column family, enforces time/rate constraints, and supports both sync and async execution via background task queuing.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 Hops of incremental grace,
Files compacted at a pace,
Rate and time constraints align,
Background tasks work fine!
Pika's storage now more keen,

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'feat:progressive compact' is vague and does not clearly convey the specific change being made. While it references compaction, it lacks clarity about what 'progressive' or 'incremental' compaction entails. Use a more descriptive title such as 'feat: add incremental compaction strategy' or 'feat: implement incremental compact with rate limiting and file age filtering' to better communicate the feature's purpose.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread .claude/settings.local.json Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an OBD compaction bug and introduces a new “incremental-compact” strategy intended to run compact work progressively with tunable limits (files/time/rate/age), wired through server scheduling and configuration.

Changes:

  • Fix OBD compaction to compact the correct RocksDB Column Family and improve failure logging.
  • Add incremental compaction end-to-end: storage API + Redis implementation + server scheduling + CONFIG GET/SET + default config entries.
  • Minor build/config hygiene updates (key encoding fix, CMake build_version source handling, .gitignore formatting, added .claude local settings file).

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/storage/src/storage.cc Adds BG task dispatch + Storage wrapper for IncrementalCompact with argv parameter passing.
src/storage/src/redis.h Exposes new Redis::IncrementalCompact API.
src/storage/src/redis.cc Fixes OBD CF compaction call; implements incremental compaction via CompactFiles.
src/storage/include/storage/storage_define.h Fixes pointer advancement bug in EncodeUserKey.
src/storage/include/storage/storage.h Adds new background operation enum and Storage::IncrementalCompact API.
src/pika_server.cc Schedules incremental compaction based on a configurable interval.
src/pika_db.cc Wires DB-level call into storage_->IncrementalCompact with config knobs.
src/pika_conf.cc Parses new incremental-compact-* configs and supports incremental-compact strategy.
src/pika_admin.cc Adds CONFIG GET/SET support for incremental compact parameters.
include/pika_server.h Adds TaskType::kIncrementalCompact and tracking timestamp for interval gating.
include/pika_db.h Declares DB::IncrementalCompact.
include/pika_conf.h Adds getters/setters + stored fields for incremental compact parameters and strategy enum.
conf/pika.conf Documents and adds defaults for incremental-compact-* config keys.
CMakeLists.txt Adjusts build-version source handling (removes src/build_version.cc from DIR_SRCS, simplifies generated source list).
.gitignore Fixes formatting of tests/tmp entry.
.claude/settings.local.json Adds a Claude local settings/permissions file to the repo.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/storage/src/redis.cc
Comment thread src/storage/src/redis.cc Outdated
Comment thread src/storage/src/redis.cc
Comment thread src/storage/src/redis.cc Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/pika_conf.cc (1)

914-931: ⚠️ Potential issue | 🟠 Major

CONFIG REWRITE still drops the active compaction strategy.

cs_ is never derived from compaction_strategy_ before SetConfStr("compaction-strategy", cs_) runs, so rewrite persists an empty value. After restart, Load() falls back to NONE, which means the new incremental-compact mode is not actually durable.

Suggested fix
-  std::string cs_;
-  SetConfStr("compaction-strategy", cs_);
-  if (cs_ == "full-compact") {
-    compaction_strategy_ = FullCompact;
-  } else if (cs_ == "obd-compact") {
-    compaction_strategy_ = OldestOrBestDeleteRatioSstCompact;
-  } else if (cs_ == "incremental-compact") {
-    compaction_strategy_ = IncrementalCompact;
-  } else {
-    compaction_strategy_ = NONE;
-  }
+  std::string cs;
+  switch (compaction_strategy_) {
+    case FullCompact:
+      cs = "full-compact";
+      break;
+    case OldestOrBestDeleteRatioSstCompact:
+      cs = "obd-compact";
+      break;
+    case IncrementalCompact:
+      cs = "incremental-compact";
+      break;
+    default:
+      cs.clear();
+      break;
+  }
+  SetConfStr("compaction-strategy", cs);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pika_conf.cc` around lines 914 - 931, The config-rewrite logic calls
SetConfStr("compaction-strategy", cs_) but cs_ is uninitialized so the current
compaction_strategy_ isn't persisted; change the flow in the Save/Rewrite path
to derive cs_ from the existing compaction_strategy_ before calling SetConfStr
(e.g., map compaction_strategy_ enum values FullCompact,
OldestOrBestDeleteRatioSstCompact, IncrementalCompact, NONE into the string
cs_), then call SetConfStr("compaction-strategy", cs_) and keep the subsequent
SetConfInt calls for incremental_* so the active incremental-compact mode is
durably written.
🧹 Nitpick comments (3)
include/pika_server.h (1)

554-554: Minor: initialization style differs from sibling fields.

last_check_compact_time_ and last_check_resume_time_ are initialized in the PikaServer constructor initializer list (see src/pika_server.cc L48-49), while last_incremental_compact_time_ uses an inline default member initializer here. Functionally equivalent, but consider initializing it in the same place as the other timing fields for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/pika_server.h` at line 554, Move the inline default initialization of
the timeval member last_incremental_compact_time_ out of the class declaration
and initialize it alongside the other timing fields in the PikaServer
constructor initializer list (same place where last_check_compact_time_ and
last_check_resume_time_ are initialized in PikaServer), ensuring the member
declaration becomes just "struct timeval last_incremental_compact_time_;" and
the constructor sets it to {0, 0}.
src/storage/src/redis.cc (2)

6-9: <thread> is not used in this translation unit.

std::chrono (<chrono>) and std::time (<ctime>) are used by IncrementalCompact, but I don't see any std::thread, std::this_thread, or std::jthread reference introduced. Consider dropping #include <thread> to avoid unnecessary header bloat.

♻️ Proposed change
 `#include` <chrono>
 `#include` <sstream>
 `#include` <ctime>
-#include <thread>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/redis.cc` around lines 6 - 9, The translation unit includes
<thread> which isn't used; remove the unnecessary `#include` <thread> from the top
of the file to avoid header bloat and rebuild; check symbols like
IncrementalCompact to ensure it still compiles (it uses <chrono> and <ctime>
only) and no std::thread/std::this_thread references remain.

557-569: Minor: rate computation uses int division and ignores the unconditional processed++ semantics.

Two small things worth noting here:

  1. job_info.stats.num_output_records and num_input_records are uint64_t; multiplying by 100 first is safe, but assigning to int rate is surprising. Using uint64_t rate = ... * 100 / ...; or a double would be clearer and avoids any future surprise if the ratio ever legitimately exceeds INT_MAX / 100.
  2. processed++ is skipped whenever rate >= min_rate because of the break, but it is incremented when num_input_records == 0 (e.g., a file whose entries were all filtered out). The quota semantics are therefore "files that were compacted and still had low compression benefit" — document this or rename max_filesmax_low_benefit_files to match the comparison with v4's strict max_files quota noted in the PR description.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/redis.cc` around lines 557 - 569, The rate calculation and
processed-count semantics in the IncrementalCompact loop need tightening: change
the local rate from int to an unsigned 64-bit (or double) to compute rate =
job_info.stats.num_output_records * 100 / job_info.stats.num_input_records
safely (use uint64_t or double for rate) and ensure processed is incremented for
every examined file (move or add processed++ before the rate >= min_rate break)
or explicitly rename/clarify the quota variable (e.g., max_low_benefit_files) to
reflect that files which trigger the break are not counted as processed; update
references to rate, job_info.stats.num_output_records,
job_info.stats.num_input_records, processed, min_rate, and max_files
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/pika_server.cc`:
- Around line 1240-1249: Clamp the interval returned by
g_pika_conf->incremental_compact_interval() to a sensible positive minimum
(e.g., >=1) before using it in the comparison with
last_incremental_compact_time_ so a zero/negative config cannot disable
throttling; then, detect when compact-interval (the AutoCompactRange path) is
also set and either log a warning that compact-interval takes precedence over
compaction_strategy == PikaConf::IncrementalCompact or explicitly
document/handle this precedence; update the block that calls
DoSameThingEveryDB(TaskType::kIncrementalCompact) and add the warning near where
AutoCompactRange early-returns so users aren’t silently ignored.

In `@src/storage/src/redis.cc`:
- Around line 492-499: The IncrementalCompact loop resets cf_start_time for each
column family (cf_start_time used inside IncrementalCompact loop with
handles_[idx]->GetName() and max_time_ms), so max_time_ms acts as a per-CF
budget and total wall time can be K * max_time_ms; either make the budget global
by moving/start a single start_time outside the CF loop and compare elapsed
against max_time_ms for the whole IncrementalCompact invocation, or rename the
config key and code/comment to incremental-compact-per-cf-max-time-ms (and
update documentation/conf/pika.conf) to make per-CF semantics explicit; pick one
approach and update the related logic and comments referencing cf_start_time,
max_time_ms, and IncrementalCompact accordingly.
- Around line 551-574: The code currently logs failures from
db_->CompactFiles(...) (the local variable s in the CompactFiles loop) but then
unconditionally pushes Status::OK() into compact_result_vec, hiding real errors;
change the logic so that when CompactFiles returns a non-OK Status (the s
checked in the if (!s.ok()) branch) you record that Status into
compact_result_vec (or record the first non-OK per column family) instead of
Status::OK(), and ensure any early break from the inner loop preserves and
pushes that failing Status for the corresponding CF; adjust the
push_back(Status::OK()) at the end to only push OK when no failures were
observed for that CF.
- Around line 466-472: in_compact_flag_ (the std::atomic<bool> used with
compare_exchange_weak in methods like the compact routine) is not explicitly
initialized which can yield indeterminate behavior; fix this by initializing
in_compact_flag_ to false either in its member declaration or in the class
constructor initializer list so compare_exchange_weak reads a well-defined value
and correctly prevents concurrent compactions when using the in_compact_flag_
check/DEFER pattern.
- Around line 522-529: The code builds an absolute path for oldest_file by
concatenating file_meta.db_path + "/" + file_meta.name, but CompactFiles expects
relative file names (SstFileMetaData::name already contains the leading "/"
relative path); update the assignment that sets oldest_file to use
file_meta.name directly (preserve setting of oldest_number and oldest_level) so
CompactFiles receives the correct relative path format; ensure ExtractFileNumber
usage and comparisons remain unchanged.

In `@src/storage/src/storage.cc`:
- Around line 1793-1799: The background task is being enqueued with type ==
DataType::kAll which makes Storage::AddBGTask clear the whole queue; change the
task type passed to AddBGTask so it is not DataType::kAll (i.e., use a specific
non‑all data type or a dedicated incremental type) when creating the async
incremental compaction task — update the call that currently does
AddBGTask({type, kIncrementalCompact, {...}}) to pass an explicit non‑kAll
DataType (or a new DataType::kIncremental if available) so Storage::AddBGTask
will not drop existing queued work.

---

Outside diff comments:
In `@src/pika_conf.cc`:
- Around line 914-931: The config-rewrite logic calls
SetConfStr("compaction-strategy", cs_) but cs_ is uninitialized so the current
compaction_strategy_ isn't persisted; change the flow in the Save/Rewrite path
to derive cs_ from the existing compaction_strategy_ before calling SetConfStr
(e.g., map compaction_strategy_ enum values FullCompact,
OldestOrBestDeleteRatioSstCompact, IncrementalCompact, NONE into the string
cs_), then call SetConfStr("compaction-strategy", cs_) and keep the subsequent
SetConfInt calls for incremental_* so the active incremental-compact mode is
durably written.

---

Nitpick comments:
In `@include/pika_server.h`:
- Line 554: Move the inline default initialization of the timeval member
last_incremental_compact_time_ out of the class declaration and initialize it
alongside the other timing fields in the PikaServer constructor initializer list
(same place where last_check_compact_time_ and last_check_resume_time_ are
initialized in PikaServer), ensuring the member declaration becomes just "struct
timeval last_incremental_compact_time_;" and the constructor sets it to {0, 0}.

In `@src/storage/src/redis.cc`:
- Around line 6-9: The translation unit includes <thread> which isn't used;
remove the unnecessary `#include` <thread> from the top of the file to avoid
header bloat and rebuild; check symbols like IncrementalCompact to ensure it
still compiles (it uses <chrono> and <ctime> only) and no
std::thread/std::this_thread references remain.
- Around line 557-569: The rate calculation and processed-count semantics in the
IncrementalCompact loop need tightening: change the local rate from int to an
unsigned 64-bit (or double) to compute rate = job_info.stats.num_output_records
* 100 / job_info.stats.num_input_records safely (use uint64_t or double for
rate) and ensure processed is incremented for every examined file (move or add
processed++ before the rate >= min_rate break) or explicitly rename/clarify the
quota variable (e.g., max_low_benefit_files) to reflect that files which trigger
the break are not counted as processed; update references to rate,
job_info.stats.num_output_records, job_info.stats.num_input_records, processed,
min_rate, and max_files accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3ceb47fa-89e8-43f5-a921-f3280492d2b8

📥 Commits

Reviewing files that changed from the base of the PR and between 651dae3 and 47284f9.

📒 Files selected for processing (15)
  • .gitignore
  • CMakeLists.txt
  • conf/pika.conf
  • include/pika_conf.h
  • include/pika_db.h
  • include/pika_server.h
  • src/pika_admin.cc
  • src/pika_conf.cc
  • src/pika_db.cc
  • src/pika_server.cc
  • src/storage/include/storage/storage.h
  • src/storage/include/storage/storage_define.h
  • src/storage/src/redis.cc
  • src/storage/src/redis.h
  • src/storage/src/storage.cc

Comment thread src/pika_server.cc
Comment thread src/storage/src/redis.cc
Comment thread src/storage/src/redis.cc
Comment thread src/storage/src/redis.cc
Comment on lines +522 to +529
uint64_t number = ExtractFileNumber(file_meta.name);
if (number < oldest_number) {
oldest_number = number;
oldest_file = file_meta.db_path + "/" + file_meta.name;
oldest_level = level_meta.level;
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the format of SstFileMetaData::name and db_path in the RocksDB version used.
fd -t f 'metadata.h' | xargs rg -n -C2 'SstFileMetaData|struct .*FileMetaData' | head -200
# Find any CompactFiles usage in this repo that already constructs the path correctly as reference.
rg -nP -C3 'CompactFiles\s*\('

Repository: OpenAtomFoundation/pikiwidb

Length of output: 553


🏁 Script executed:

#!/bin/bash
# Read the context around the flagged lines (522-529) to understand the full flow
sed -n '500,570p' src/storage/src/redis.cc | cat -n

Repository: OpenAtomFoundation/pikiwidb

Length of output: 2855


🏁 Script executed:

#!/bin/bash
# Find all occurrences of oldest_file in the file to understand how it's used
rg -n 'oldest_file' src/storage/src/redis.cc

Repository: OpenAtomFoundation/pikiwidb

Length of output: 518


🏁 Script executed:

#!/bin/bash
# Look for input_files construction to see if oldest_file or similar paths are added to it
rg -n -A5 -B5 'input_files\s*[.push|=]' src/storage/src/redis.cc | head -100

Repository: OpenAtomFoundation/pikiwidb

Length of output: 53


🏁 Script executed:

#!/bin/bash
# Search for RocksDB includes and version information
rg -n 'include.*rocksdb|rocksdb.*version' src/storage/src/redis.cc | head -20

Repository: OpenAtomFoundation/pikiwidb

Length of output: 325


🏁 Script executed:

#!/bin/bash
# Find if there are any other places in the codebase that construct paths from SstFileMetaData
rg -n 'file_meta\.(db_path|name)' src/ --type cc --type h

Repository: OpenAtomFoundation/pikiwidb

Length of output: 97


🏁 Script executed:

#!/bin/bash
# Check if there's a CMakeLists or conanfile that shows RocksDB version
find . -name 'CMakeLists.txt' -o -name 'conanfile.*' -o -name 'vcpkg.json' -o -name 'package.json' | xargs grep -l rocksdb 2>/dev/null | head -5

Repository: OpenAtomFoundation/pikiwidb

Length of output: 223


🏁 Script executed:

#!/bin/bash
# Check CMakeLists.txt for RocksDB version
cat CMakeLists.txt | grep -i rocksdb -A5 -B5

Repository: OpenAtomFoundation/pikiwidb

Length of output: 1533


🏁 Script executed:

#!/bin/bash
# Search for file_meta usage with different approach
find src -type f \( -name '*.cpp' -o -name '*.cc' -o -name '*.h' -o -name '*.hpp' \) -exec rg 'file_meta\.(db_path|name)' {} + 2>/dev/null

Repository: OpenAtomFoundation/pikiwidb

Length of output: 251


🏁 Script executed:

#!/bin/bash
# Also check what ExtractFileNumber function does - it might give clues about the format
rg -n -A10 'ExtractFileNumber' src/storage/src/redis.cc

Repository: OpenAtomFoundation/pikiwidb

Length of output: 805


🌐 Web query:

RocksDB v8.7.3 SstFileMetaData name db_path format structure

💡 Result:

In RocksDB v8.7.3 (part of the v8.x series), the SstFileMetaData structure describes metadata for SST (Sorted String Table) files. It inherits from FileStorageInfo and includes fields relevant to table files. Key fields related to name, db_path, and format: - relative_filename: The name of the file within its directory (e.g., "123456.sst"). This is the basename without path. - directory: The directory containing the file (e.g., the db_path or column family path), without trailing '/'. - db_path: Deprecated; previously the full path or directory, now use directory instead. - name: Deprecated; previously the relative filename with leading slash (e.g., "/123456.sst"). SST files follow the naming convention: <file_number>.sst where file_number is a unique uint64_t identifier for the file within the DB. The full path is typically constructed as db_path + "/" + relative_filename (e.g., "/path/to/db/000123.sst"). Other notable fields in SstFileMetaData: - file_number: uint64_t ID of the file (used in filename). - size: File size in bytes. - smallestkey, largestkey: Smallest and largest user keys. - smallest_seqno, largest_seqno: Sequence number bounds. - being_compacted: bool if currently being compacted. - num_reads_sampled: Number of reads. - num_entries, num_deletions: Counts. - temperature: File temperature (kHot, kWarm, kCold). - oldest_blob_file_number, file_creation_time, epoch_number, file_checksum, etc. This structure is returned via APIs like DB::GetColumnFamilyMetaData in LiveFileMetaData (which extends SstFileMetaData) or ColumnFamilyMetaData. No specific changes to this structure in v8.7.3 release notes; the main source is the current header which reflects v8.x stability.

Citations:


🌐 Web query:

RocksDB SstFileMetaData file path format leading slash

💡 Result:

In RocksDB's SstFileMetaData (and derived LiveFileMetaData), the file path format uses two separate fields: fileName (or relative_filename in C++), which is the filename without path (e.g., "123456.sst"), and path (or directory in C++), which is the directory path without trailing slash. The full path is constructed as directory + "/" + relative_filename. Regarding leading slash specifically: - In C++ SstFileMetaData constructor: If input _file_name starts with '/', sets relative_filename = _file_name.substr(1) (strips leading '/'), and deprecated name = _file_name (keeps '/'). Otherwise, relative_filename = _file_name, name = "/" + _file_name (adds leading '/'). This ensures the deprecated name field always has a leading slash (e.g., "/123456.sst"), while relative_filename does not. - In Java JNI bindings: path returns "the full path where the file locates" (absolute or relative to DB root), fileName returns "the name of the file". - In RocksDB.getLiveFiles: Returns relative paths that "begin with '/'", confirming the convention of leading slash for relative file listings (e.g., "/000123.sst" relative to dbname/). SST files are stored relative to the database directory (or db_paths), so paths in metadata are typically relative with leading '/' convention for filenames in listings/metadata.

Citations:


🌐 Web query:

RocksDB CompactFiles input_file_names path matching format requirements

💡 Result:

The input_file_names parameter for RocksDB's DB::CompactFiles (or ColumnFamilyHandle::CompactFiles) is a std::vectorstd::string containing the names of SST files to compact. These must be existing, live SST files in the specified column family (default if using DB). Non-SST files like /CURRENT or /MANIFEST-xxx will cause errors such as "Specified compaction input file /000000.sst does not exist in column family default" or similar validation failures. File name format: Relative paths starting with '/', e.g., "/000013.sst", as returned by DB::GetLiveFiles or GetSortedWalFiles. These are relative to the database directory (RocksDB.getName). Absolute paths or plain basenames like "000013.sst" without '/' prefix are invalid and will fail validation, as RocksDB checks for exact matches against internal file representations. To obtain valid names: - Use DB::GetLiveFiles for simple list (filter .sst files). - Use DB::GetLiveFilesMetaData for structured LiveFileMetaData with .name field (e.g., "/123.sst") and level info. Example usage: CompactionOptions options; std::vectorstd::string input_file_names = {"/000012.sst", "/000013.sst"}; int output_level = 1; Status s = db->CompactFiles(options, input_file_names, output_level); Only supported in non-LITE mode. Files must overlap appropriately per LSM rules for correctness.

Citations:


Fix path construction: CompactFiles expects relative paths, not absolute paths with double slashes.

The issue goes beyond the double slash: CompactFiles requires relative file paths (e.g., "/000123.sst") relative to the database directory, not absolute paths. The current code file_meta.db_path + "/" + file_meta.name concatenates an absolute path (/var/db//000123.sst), which will fail validation when matched against RocksDB's internal file representations.

SstFileMetaData::name already provides the correct relative path with leading / (e.g., "/000123.sst"). Use it directly:

🛠️ Proposed fix
-          oldest_file = file_meta.db_path + "/" + file_meta.name;
+          // SstFileMetaData::name is already the relative path with leading '/'
+          oldest_file = file_meta.name;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/redis.cc` around lines 522 - 529, The code builds an absolute
path for oldest_file by concatenating file_meta.db_path + "/" + file_meta.name,
but CompactFiles expects relative file names (SstFileMetaData::name already
contains the leading "/" relative path); update the assignment that sets
oldest_file to use file_meta.name directly (preserve setting of oldest_number
and oldest_level) so CompactFiles receives the correct relative path format;
ensure ExtractFileNumber usage and comparisons remain unchanged.

Comment thread src/storage/src/redis.cc
Comment thread src/storage/src/storage.cc
@chejinge chejinge changed the title feat: 测试渐进式compact feat:progressive compact Apr 17, 2026
@chejinge chejinge changed the title feat:progressive compact feat:progressive compact Apr 17, 2026
- 新增 compaction-strategy: incremental-compact
- 使用 CompactFiles 每次处理少量最老的 SST 文件
- 可配置单次处理文件数、执行时间、压缩率阈值
- 与 dump 硬连接方案分离,独立测试
漏传了 handles_[idx],导致非kMetaCF 的 CF 触发 force compact时,实际被 compact 的是 kMetaCF
@chenbt-hz chenbt-hz force-pushed the test-incremental-compact branch from 47284f9 to 8043759 Compare April 17, 2026 06:40
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/storage/src/storage.cc`:
- Around line 1778-1791: The loop currently overwrites s for each inst and only
returns the last instance Status, losing earlier failures; change
IncrementalCompact handling to record and return the first non-OK Status
encountered: declare a Status first_failure (initialized OK) before the for
(const auto& inst : insts_) loop, after calling inst->IncrementalCompact(...)
check s and set first_failure = s if first_failure.ok() and !s.ok(), then
iterate compact_result_vec and if any compact_result is not ok and
first_failure.ok() set first_failure = compact_result; keep logging as-is but at
the end return first_failure (or s if you prefer), ensuring IncrementalCompact
failures from earlier insts_ or individual compact_result_vec entries are
propagated instead of discarded.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 59b2c9d9-3603-4655-aa5c-93e70291f28e

📥 Commits

Reviewing files that changed from the base of the PR and between 47284f9 and 8043759.

📒 Files selected for processing (14)
  • .gitignore
  • CMakeLists.txt
  • conf/pika.conf
  • include/pika_conf.h
  • include/pika_db.h
  • include/pika_server.h
  • src/pika_admin.cc
  • src/pika_conf.cc
  • src/pika_db.cc
  • src/pika_server.cc
  • src/storage/include/storage/storage.h
  • src/storage/src/redis.cc
  • src/storage/src/redis.h
  • src/storage/src/storage.cc
✅ Files skipped from review due to trivial changes (2)
  • .gitignore
  • src/pika_server.cc
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/pika_db.cc
  • src/pika_admin.cc
  • conf/pika.conf
  • src/pika_conf.cc
  • src/storage/src/redis.cc

Comment on lines +1778 to +1791
if (sync) {
Status s;
for (const auto& inst : insts_) {
std::vector<rocksdb::Status> compact_result_vec;
s = inst->IncrementalCompact(type, &compact_result_vec, storage::kMetaAndData,
max_files, max_time_ms, min_rate,
min_file_age);
for (auto compact_result : compact_result_vec) {
if (!compact_result.ok()) {
LOG(ERROR) << compact_result.ToString();
}
}
}
return s;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Propagate the first compaction failure instead of returning only the last instance status.

If an earlier instance fails and a later one succeeds, this method returns OK and the failure is reduced to a log line. That makes the new API report success for partial failures.

Suggested fix
 Status Storage::IncrementalCompact(const DataType &type, int max_files, int max_time_ms,
                                    int min_rate, int min_file_age, bool sync) {
   if (sync) {
-    Status s;
+    Status overall = Status::OK();
     for (const auto& inst : insts_) {
       std::vector<rocksdb::Status> compact_result_vec;
-      s = inst->IncrementalCompact(type, &compact_result_vec, storage::kMetaAndData,
-                                   max_files, max_time_ms, min_rate,
-                                   min_file_age);
+      Status s = inst->IncrementalCompact(type, &compact_result_vec, storage::kMetaAndData,
+                                          max_files, max_time_ms, min_rate,
+                                          min_file_age);
+      if (!s.ok() && overall.ok()) {
+        overall = s;
+      }
       for (auto compact_result : compact_result_vec) {
         if (!compact_result.ok()) {
           LOG(ERROR) << compact_result.ToString();
+          if (overall.ok()) {
+            overall = compact_result;
+          }
         }
       }
     }
-    return s;
+    return overall;
   } else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/storage.cc` around lines 1778 - 1791, The loop currently
overwrites s for each inst and only returns the last instance Status, losing
earlier failures; change IncrementalCompact handling to record and return the
first non-OK Status encountered: declare a Status first_failure (initialized OK)
before the for (const auto& inst : insts_) loop, after calling
inst->IncrementalCompact(...) check s and set first_failure = s if
first_failure.ok() and !s.ok(), then iterate compact_result_vec and if any
compact_result is not ok and first_failure.ok() set first_failure =
compact_result; keep logging as-is but at the end return first_failure (or s if
you prefer), ensuring IncrementalCompact failures from earlier insts_ or
individual compact_result_vec entries are propagated instead of discarded.

@chejinge chejinge requested a review from Mixficsol April 17, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

✏️ Feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants