C++: optimize batch read/write paths and aligned table null handling by ColinLeeo · Pull Request #823 · apache/tsfile

ColinLeeo · 2026-05-26T01:55:33Z

Summary

This PR optimizes the C++ TsFile read/write paths for batch and columnar workloads, and fixes several aligned table null-handling issues uncovered while validating the optimized path.

It consolidates the batch decode/write work from the long-lived optimization branch into a reviewable change for develop.

Supersedes #749, #754, and #774.

Main Changes

Add batch decode APIs to the C++ Decoder hierarchy and implement batch paths for PLAIN, TS2DIFF, and Gorilla.
Add TsBlock-level batch read paths in ChunkReader and AlignedChunkReader, including shared timestamp decoding for aligned multi-value reads.
Add columnar tablet write helpers and batch page writes for fixed-length and string columns.
Add optional parallel page decode/write infrastructure behind the existing C++ configuration.
Add SIMD-oriented fast paths for TS2DIFF, PLAIN byte swapping, and selected statistics updates.
Improve ByteStream, compressor, and page/chunk writer internals used by the optimized paths.
Simplify the C wrapper surface by removing unused metadata/tag-filter export APIs.

Correctness Fixes

Fix null TAG device boundary detection so null string offsets are not read as valid values.
Fix null string FIELD writes so null rows do not read uninitialized string offsets.
Preserve all-null aligned value pages by counting page rows separately from non-null statistic counts.
Treat missing aligned measurements as sparse columns instead of failing the whole device read.
Add a time-only aligned read path for devices whose queried FIELD columns are all null/missing.
Fix repeated logical devices inside one tablet by aggregating their segments before writing aligned chunks.
Fix ValuePageWriter::reset() so row count and null bitmap state are reset together.
Remove unused temporary TS2DIFF prototype headers that were not part of the production implementation.

Compatibility Notes

Existing C++ APIs remain additive for the optimized read/write paths.
The public C wrapper removes unused metadata export/tag-filter handles. Downstream wrappers should sanity-check that they do not depend on those removed symbols.
cpp/third_party/ is left at develop state so existing platform compatibility fixes are preserved.

Verification

cmake --build cpp/target/build --target TsFile_Test -j1
ctest --test-dir cpp/target/build/test --output-on-failure -R '^TsFileTableReaderTest\.TestNullInTable4$'
ctest --test-dir cpp/target/build/test --output-on-failure -j4
cd cpp && mvn spotless:check
cd cpp && mvn apache-rat:check

Current full C++ test result: 496/496 tests pass.

Brings together batch decode infrastructure, multi-value aligned read, parallel page decode, columnar tablet write, and SIMD micro-optimizations from the long-lived `final` branch into a single review-ready change. This change is a code snapshot, not a replay of `final` commit history -- the upstream history was a long sequence of WIP commits that wasn't fit for review. Supersedes #749, #754, #774. Read path - Decoder base gains batch APIs (read_batch_int32/int64/float/double, skip_*); PLAIN, TS2DIFF, Gorilla decoders implement them. TS2DIFF has block-level peeking so time filters can skip blocks without decoding. Gorilla adds a raw-pointer GorillaBitReader that bypasses ByteStream overhead. - ChunkReader / AlignedChunkReader add *_DECODE_TV_BATCH methods that decode time + value into a TsBlock in one pass, applying batch time filters before append. - AlignedChunkReader supports a multi-value mode: one time chunk + N value chunks decoded in a single pass, sharing the decoded timestamps and filter mask. SingleDeviceTsBlockReader auto-detects same-device measurements via VectorMeasurementColumnContext. - Optional page-level parallel decompression via a DecodeThreadPool + BlockingQueue when ENABLE_THREADS is set. Page-plan classification (SKIP / FULL_PASS / BOUNDARY) lets a scatter-free memcpy fast path fire when every row passes and no column has nulls. Write path - ValuePageWriter gains write_batch / write_string_batch that take timestamp+value+nullness arrays directly, removing the per-value append loop. Tablet exposes set_timestamps / set_column_values / set_column_string_repeated / reset for bulk reuse and switches StringColumn to an Arrow-compatible offset+buffer layout. - TS2DIFFEncoder::flush now packs all deltas with a single pack_bits_msb + write_buf instead of per-value write_bits, falling back to the scalar path for the rare bit_width > 56 case. - Int64Statistic::update_batch (NEON-accelerated min/max/sum). Encoding / SIMD - TS2DIFF batch decode adds AVX2 helpers via SIMDe (already on develop) for both i32 and i64; scalar fallback unchanged. - PLAIN byte-swap path uses ARM NEON (vrev64q_u8 / vrev32q_u8) when available, falling back to __builtin_bswap. - CMakeLists adds ENABLE_SIMD and turns on -O3 -march=native -flto in Release builds. Allocator / ByteStream - ByteStream caches page_mask_ (= page_size - 1) so the hot path uses a bitmask instead of modulo; wrap_from rounds buffer sizes up to a power of two so the mask remains correct. total_size_ widened to uint64_t to support files > 4GB. - UncompressedCompressor now copies its output instead of aliasing caller buffers, letting callers free input safely. C wrapper / Arrow - Trimmed unused metadata-export surface (TsFileStatisticBase, TimeseriesMetadata, DeviceTimeseriesMetadataEntry, tag-filter handles) out of the public C API. Internal tag filtering is unaffected. - arrow_c.cc simplified: per-row offset handling for sliced variable-length arrays in place of the InvertArrowBitmap copy. Tests / benchmarks - New tsfile_reader_table_batch_test.cc covers the TsBlock batch read path. gorilla_codec_test.cc adds Int32/Int64/Float batch decode tests. examples/cpp_examples adds bench_read.cpp/.h and an examples/read_perf_compare/ target. - Removed cwrapper_metadata_test.cc and common/path.cc (Path bodies inlined into path.h; the C metadata API they covered is gone). Compatibility - All new C++ methods are additions; no existing C++ API was removed. - C wrapper headers lost the metadata export / tag filter symbols listed above -- downstream callers (Python wrapper in particular) will want a sanity check before merge. - cpp/third_party/ intentionally left at develop's state so the recent MSVC compatibility fixes (WITH_STATIC_CRT OFF, CMP0054 NEW, CMAKE_POLICY_VERSION_MINIMUM=3.5, _MSC_VER guards) are preserved. Verification - cmake configure + make -j on macOS arm64 (AppleClang, C++11) builds cleanly: libtsfile.2.2.1.dev.dylib and TsFile_Test both link, zero errors, only unused-lambda-capture warnings in pre-existing tests. - Full TsFile_Test run and downstream Python binding load are left as pre-merge checks. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

codecov-commenter · 2026-05-28T09:52:17Z

Codecov Report

❌ Patch coverage is 37.57782% with 2206 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.35%. Comparing base (e3cdf87) to head (5be280e).
⚠️ Report is 5 commits behind head on develop.

Files with missing lines	Patch %	Lines
cpp/src/reader/aligned_chunk_reader.cc	13.45%	665 Missing and 36 partials ⚠️
cpp/src/cwrapper/tsfile_cwrapper.cc	39.69%	309 Missing and 45 partials ⚠️
...p/src/reader/block/single_device_tsblock_reader.cc	28.75%	148 Missing and 18 partials ⚠️
cpp/src/encoding/ts2diff_decoder.h	32.01%	125 Missing and 13 partials ⚠️
cpp/src/writer/tsfile_writer.cc	53.65%	72 Missing and 23 partials ⚠️
cpp/src/reader/filter/time_operator.cc	0.00%	80 Missing ⚠️
cpp/src/reader/chunk_reader.cc	38.33%	32 Missing and 42 partials ⚠️
cpp/src/file/tsfile_io_reader.cc	42.45%	38 Missing and 23 partials ⚠️
cpp/src/encoding/decoder.h	0.00%	60 Missing ⚠️
cpp/src/common/statistic.h	67.42%	30 Missing and 27 partials ⚠️
... and 41 more

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #823      +/-   ##
===========================================
- Coverage    61.32%   58.35%   -2.97%     
===========================================
  Files          731      733       +2     
  Lines        45521    47547    +2026     
  Branches      6772     7376     +604     
===========================================
- Hits         27914    27747     -167     
- Misses       16615    18619    +2004     
- Partials       992     1181     +189

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The big batch read/write snapshot (5f12115) was taken against an older `final` branch base, so it inadvertently dropped the TimeseriesMetadata / DeviceID family of C APIs that develop had gained via #767. Python bindings still reference them, so the binding fails to load. Restore the typedefs, function declarations, helpers, and public implementations verbatim from develop, and add the missing common/* includes (device_id.h, statistic.h, tsfile_common.h, <vector>). Also fix an unrelated CMake parse error in test/CMakeLists.txt where \${URL} had been split across two lines.

These wall-clock perf comparisons with 5% tolerance are inherently flaky on noisy CI; develop already disabled them. The batch-opt snapshot (5f12115) was based on an older final branch and reverted that disable. Re-add the DISABLED_ prefix in both tree-view and table-view variants to match develop.

Match develop: rename the fixture class to DISABLED_ so all four TEST_F cases (TreeModel/TableModel × Single/MultiSequence) are skipped by default. They're wall-clock perf benchmarks meant to be run on demand via --gtest_also_run_disabled_tests.

Python binding (already on develop) calls these table-model query APIs with a tag_filter + batch_size argument, but the batch-opt snapshot (5f12115) reverted them to the older final signatures that drop tag_filter (and rename one). Realign with develop: - TsFileReader::queryByRow(table_name, ...) gains Filter*/batch_size default args and passes tag_filter through to TableQueryExecutor. - tsfile_reader_query_table_by_row gains tag_filter + batch_size. - tsfile_query_table_batch gains tag_filter (between end_time and batch_size). - tsfile_query_table_batch_with_filter is renamed to tsfile_query_table_with_tag_filter with develop's parameter order (tag_filter before batch_size); the old name had no callers. - TagFilterHandle typedef hoisted above first use.

Several more bits that develop had but the batch-opt snapshot reverted: - cwrapper: add TagFilterOp enum + tsfile_tag_filter_create + tsfile_tag_filter_between. Python binding already calls these. - StringArrayDeviceID::split_device_id_string(string): always feed the raw splits through the interpretive split_device_id_string(vec) overload (joining the first N-1 segments into a table_name) for both ANTLR4 and non-ANTLR4 paths. The snapshot had bypassed the interpretation in the non-ANTLR4 branch, which produced 3 segments for "root.sg.d1" instead of {"root.sg","d1"} and broke metadata consumers (and the restored cwrapper_metadata_test). ENABLE_ANTLR4 is set as a CMake option but never -D'd to the compiler, so every build was hitting the non-ANTLR4 branch. - test: restore cpp/test/cwrapper/cwrapper_metadata_test.cc (294 lines, taken from develop) -- the file is auto-picked by the test CMakeLists' GLOB_RECURSE. - test: re-apply DISABLED_ prefix to LargeFileNoEncodingNoCompression_ WriteAndRead and MemStatWriteAndVerify; develop has them disabled. - test: update query_by_row_cwrapper_test.cc:218 to pass the new tag_filter/batch_size args added to tsfile_reader_query_table_by_row. 493 tests pass, 8 disabled (the perf benchmarks).

Mechanical reformat after the previous restore commits; no behavior change.

Four behavior gaps the batch-opt snapshot reintroduced surfaced as Python test failures (tag-filter tag-filtered table queries, dataset arrow batch reads, multi-segment device tree queryByRow, missing-path tree queryByRow): - Path(string): join nodes[:-1] back into a single string and feed it through the interpretive StringArrayDeviceID(string) ctor instead of storing the raw per-segment vector. Without this, a stored "root.sg.d1" device (2 segments after the interpretive split) does not match a query path of "root.sg.d1.s1" whose device gets stored as 3 segments. - QDSWithoutTimeGenerator::init: skip paths whose device/measurement isn't in the file (E_MEASUREMENT_NOT_EXIST, E_DEVICE_NOT_EXIST, E_NOT_EXIST, and the bloom-filter early return E_NO_MORE_DATA) and keep going, like the Java implementation. Previously any missing path aborted the whole query. - TableQueryExecutor::query (with offset/limit): pass the executor's return_mode_ to the TableResultSet. Without it the result set defaults to RETURN_ROW and get_next_tsblock returns E_INVALID_ARG for batched/Arrow consumers. - TsFileReader: introduce ensure_table_query_executor(batch_size) helper. The cached TableQueryExecutor was created on first call and reused regardless of batch_size, so a later batched query would inherit the original RETURN_ROW executor. All 493 C++ tests and 144 Python tests pass.

The MinGW gcc 15.2 + statically-linked antlr4_static combination fails to resolve antlr4::Lexer symbols (setChannel, getChannel, non-virtual thunks, ...) at link time when -flto is on -- the LTO intermediate objects can't see the .a's vtable thunks. -march=native is also wrong for CI binaries that get shipped to other machines. Keep both flags on Linux/macOS where they actually pay off. Also fix a stray '${' line break in the MSVC ASAN section of cpp/CMakeLists.txt -- same kind of broken variable reference that took out cpp/test/CMakeLists.txt earlier.

Four portability gaps that surface as build errors on all MSVC toolchains (and as additional std::min errors on v141): - common/allocator/my_string.h: restore #include <algorithm> that the batch-opt snapshot dropped. Without it, std::min isn't declared on v141, the unqualified min(...) call falls through to the non-const common::String::min member, and MSVC errors out with C2039/C2662. - common/container/bit_map.h: add a uint64_t overload of bitops::ctz_nonzero with MSVC _BitScanForward64 / fallback loop branches alongside the existing uint32_t one. Existing uint8_t callers in BitMap::next_set_bit are cast to uint32_t to keep the overload set unambiguous. - common/tablet.cc: replace __builtin_ctzll with the new common::bitops::ctz_nonzero(uint64_t) helper. - encoding/plain_decoder.h: add inline plain_bswap32 / plain_bswap64 helpers (MSVC _byteswap_ulong / _byteswap_uint64) and use them in the three big-endian read_batch sites instead of __builtin_bswap*. - cwrapper/tsfile_cwrapper.cc: guard S_ISDIR (POSIX-only) with a _WIN32 fallback using _S_IFDIR. 493 C++ tests still pass on macOS.

DeviceQueryTask and the MetaIndexNode tree that feeds it are placement-new'd into PageArena memory by the batch-read path. The arenas only release the raw bytes, so the non-trivial members of those objects (std::vector<std::string>, std::vector<shared_ptr>, shared_ptr<IDeviceID> -> StringArrayDeviceID, etc.) leak when the arena goes away without ever invoking the destructors. Linux LSan caught it; macOS ASan doesn't ship LSan so the same job passes on the mac runner. Two surgical changes: - DeviceTaskIterator now records every DeviceQueryTask it produces in a vector and explicitly calls ~DeviceQueryTask() on each one in its destructor before destroying the arena. - DeviceQueryTask::~DeviceQueryTask() now invokes ~MetaIndexNode() on index_root_ (which was placement-new'd into DeviceMetaIterator's arena and handed over here via DeviceMetaIterator::next). Once that destructor runs, the children_ vector tears down its shared_ptr<IMetaIndexEntry> graph, fires the existing self_destructor / self_deleter for IMetaIndexEntry and DeviceMetaIndexEntry, which in turn releases the shared_ptr<StringArrayDeviceID>. All three top-tier leak signatures in the ASan log chain off this single root.

Three independent regressions the batch-opt snapshot introduced that only surface on the MSVC toolchain: 1. reader/result_set.h missed #include <cctype>. v141 doesn't pull the one-arg std::tolower(int) overload in transitively, so the CaseInsensitiveHash / CaseInsensitiveEqual call sites fail to find it and fall through to the locale-aware two-arg template (C2672/C2780). Add the include. 2. common/tablet.h had no copy/move semantics, so MSVC Debug (which doesn't always perform NRVO) was shallow-copying a Tablet on `return tablet;` in test helpers like gen_tablet. The source destructor then freed value_matrix_[c].string_col while the returned copy still pointed at it -- heap-use-after-free in find_all_device_boundaries on every Debug+ASan table-mode test. Delete the copy ctor/assign and add a pointer-stealing move ctor/assign so return-by-value is safe. 3. encoding/int{32,64}_rle_decoder.h dropped the RLE-run decode branch in the snapshot, leaving only bit-packing. The encoder still emits varint headers with the low bit indicating bit-packing vs run, so dictionary tests (which encode a single-entry vocabulary as an RLE run) read garbage codes and index past entry_index_ -> heap-buffer-overflow in std::basic_string ctor on Release+ASan. Restore develop's full decoder (RLE-run + varint header + out-param read_int API) and update the three callers (DictionaryDecoder, Int32SprintzDecoder, Int64SprintzDecoder) to the out-param API. 493 C++ tests still pass on macOS.

libstdc++'s <regex> implementation declares anonymous-namespace globals (__classnames at bits/regex.tcc:285, __collatenames at :128). With -flto each translation unit that includes <regex> ends up with its own LTO partition copy, and the ASan runtime sees two registered globals at the same name and aborts the test binary with odr-violation before any test runs. Strip -flto from CMAKE_CXX_FLAGS_RELEASE inside the ASan branch so the rest of the Release flags (-O3, -march=native) still apply.

…buffers Two real regressions only the Linux Release+ASan job catches: 1. Int64Packer::unpack_8values reads its input one byte at a time in a tight loop. With -O3 -march=native gcc autovectorizes it into AVX2 32-byte gathers that overread the input by up to one SIMD lane; the read lands in ASan's redzone and ASan reports it as SEGV (3 Int64PackerTest cases). The scalar code is correct, so just strip -march=native from CMAKE_CXX_FLAGS_RELEASE when ASan is on, alongside the -flto strip that's already there. 2. AlignedChunkReader::destroy() does chunk_pages_.clear() but the .clear() leaves the vector's internal buffer allocated. AlignedChunkReader is placement-new'd into a mem_alloc'd region that mem_free can't drain, so the ChunkPageInfo backing buffer leaks (3 table-mode tests). I can't fix it by calling chunk_reader_->~IChunkReader() in TsFileSeriesScanIterator:: destroy() because destroy() already runs manual destructors on chunk_header_/decoders/compressor inside; a virtual dtor on top would double-destruct those (CWrapperTest, BasicTest hit this). Instead, swap chunk_pages_ and page_all_times_ with empty vectors inside AlignedChunkReader::destroy() so the heap buffers get released without touching the destructor chain.

…hon loop `series[s:e]` on a TsFileDataFrame fans out to TsFileSeriesReader. read_series_by_row, which was issuing query_table_by_row in RETURN_ROW mode and then calling result_set.next() + get_value_by_name() per row. Every iteration is a Python<->C round-trip plus two list.append() calls, so a 10k-row slice costs ~30k transitions and dominates wall time -- it's actually slower than the pre-optimization scalar path because the batch/parallel work paid for by the snapshot only kicks in when batch_size>0, which this call site never set. Pass batch_size=65536 instead and drain the result set via read_arrow_batch(), pushing each TsBlock straight into numpy via the existing Arrow C-Data interface (mirrors what _read_arrow / read_device_fields_by_time_range already does). The outer while-remaining retry loop is kept so we can re-issue from the advanced offset if the native call stops at an internal block boundary before the requested window is filled. The boundary-retry unit test is updated to mock read_arrow_batch (returning a pyarrow.Table per call) and to assert batch_size>0 is actually plumbed through, so we don't silently regress to the slow path again. 152 Python tests pass.

Copilot

Pull request overview

This PR consolidates a large optimization branch into develop: it adds batch decode/write APIs across the C++ Decoder/Encoder hierarchy, multi-value aligned read paths with optional parallel decode, columnar tablet write helpers, SIMD fast paths, and a set of correctness fixes for aligned/table null handling (null TAG segments, null FIELD writes, all-null value pages, sparse aligned columns, repeated logical devices, ValuePageWriter::reset state). It also trims the C wrapper API (drops unused metadata export/tag-filter symbols, then re-adds tag-filter helpers in a different section) and removes several regression tests for behaviors it claims to fix.

Changes:

Add batch decode/encode/write paths through Decoder/Encoder/page/chunk writers and a MultiAlignedTimeseriesIndex plus single-device aligned fast-path reader.
Several aligned table fixes (null TAG/FIELD, all-null pages, single-device tablet flag, ValuePageWriter::reset, double-free of first-page buffers via release_cur_page_data).
Build/infra: SIMD option, optional BUILD_EXAMPLES, mem-stat counters widened to 64-bit, new BlockingQueue, removal of several existing regression tests, license-header punctuation churn in multiple CMake files.

Reviewed changes

Copilot reviewed 118 out of 119 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
cpp/src/reader/filter/{and,or}_filter.h, filter.h, time_operator.h	Adds `satisfy_batch_time` (uses fixed 129-element stack buffer — flagged).
cpp/src/encoding/{plain,decoder,encoder,plain_decoder,dictionary_encoder}.h	Batch encode/decode API + dictionary index assignment change (flagged).
cpp/src/writer/{value_,time_,}{chunk,page}_writer.{h,cc}	Batch write paths, first-page ownership transfer, larger page buffers.
cpp/src/writer/tsfile_table_writer.{h,cc}, tsfile_writer.h	Memoized lowercasing, idempotent close, optional parallel write pool.
cpp/src/reader/*	Aligned multi-value batch path, bloom-filter `contains`, table result-set lifecycle.
cpp/src/file/tsfile_io_writer.{h,cc}, restorable_tsfile_io_writer.cc	Recovery cleanup simplified; conditional `sync_on_close_` (flagged); chunk-group index for O(1) lookup.
cpp/src/file/tsfile_io_reader.h	Device-node cache + multi-SSI alloc.
cpp/src/common/allocator/byte_stream.h, alloc_base.h, mem_alloc.cc	Page-mask bitwise modulo + power-of-2 rounding for wrapped buffers (flagged), 64-bit stat counters.
cpp/src/common/{tablet,schema,path,global,thread_pool}.*	Single-device flag, string-column `uint32_t` offsets, Path inlined, config knobs reshuffled.
cpp/src/common/container/{bit_map,blocking_queue,byte_buffer}.*	New `BlockingQueue`, `BitMap::may_have_set_bits`, bounds asserts.
cpp/src/compress/{snappy,lz4,uncompressed}_compressor.*	Safer `after_compress` ownership handling; `Uncompressed` now copies.
cpp/src/cwrapper/{tsfile_cwrapper.h,arrow_c.cc}	Tag-filter API moved, sliced-Arrow handling reverted (loses prior bug-fix paths).
cpp/test/**	Deletes several regression tests (deep path, missing measurement, aligned NULL boundary, dictionary RLE run counts, Arrow slice-with-offset, etc.) and adds new batch/page-boundary tests.
python/tsfile/dataset/reader.py + tests	Switches row reads to `read_arrow_batch()`.
cpp/{CMakeLists.txt,examples/**,src/CMakeLists.txt,src/common/CMakeLists.txt,test/CMakeLists.txt}	Build flags, SIMD option, Arrow/Parquet-dependent examples, license-header punctuation regressions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    int satisfy_batch_time(const int64_t* times, int count, bool* mask) {
+        bool mask_right[129];
+        left_->satisfy_batch_time(times, count, mask);
+        right_->satisfy_batch_time(times, count, mask_right);
+        int pass = 0;
+        for (int i = 0; i < count; ++i) {
+            mask[i] = mask[i] || mask_right[i];
+            if (mask[i]) ++pass;
+        }
+        return pass;
+    }


+    int satisfy_batch_time(const int64_t* times, int count, bool* mask) {
+        bool mask_right[129];
+        left_->satisfy_batch_time(times, count, mask);
+        right_->satisfy_batch_time(times, count, mask_right);
+        int pass = 0;
+        for (int i = 0; i < count; ++i) {
+            mask[i] = mask[i] && mask_right[i];
+            if (mask[i]) ++pass;
+        }
+        return pass;
+    }


            index_entry_.push_back(value);
            map_size_ = map_size_ + value.length();
-            entry_index_[value] = static_cast<int>(index_entry_.size()) - 1;
+            entry_index_[value] = entry_index_.size();


+        // page_mask_ is used as a bitmask and only works correctly for
+        // power-of-2 page sizes. Round up to the next power-of-2 so that
+        // (read_pos_ & page_mask_) gives the correct within-page offset and
+        // the page-crossing check doesn't misfire on arbitrary buffer sizes.
+        uint32_t ps = 1;
+        while (ps < (uint32_t)buf_len) ps <<= 1;
+        page_size_ = ps;
+        page_mask_ = ps - 1;
        head_.store(&wrapped_page_);


    } else if (RET_FAIL(write_file_footer())) {
        std::cout << "writer file footer error, ret = " << ret << std::endl;
-    } else if (RET_FAIL(sync_file())) {
+    } else if (g_config_value_.sync_on_close_ && RET_FAIL(sync_file())) {


@@ -1,5 +1,5 @@
 #[[
-Licensed to the Apache Software Foundation (ASF) under one
+Licensed to the Apache Software Foundation(ASF) under one



 Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
+software distributed under the LICENSE is distributed on an


    file = write_file_new("test/", &error_no);
-    ASSERT_TRUE(error_no == RET_FILRET_OPEN_ERR ||
-                error_no == RET_ALREADY_EXIST);
+    ASSERT_EQ(RET_FILRET_OPEN_ERR, error_no);


@@ -1320,7 +1112,7 @@ TEST_F(TreeQueryByRowTest, DISABLED_QueryByRowFasterThanManualNext) {
    write_test_file(devices, measurements, num_rows);

    const int num_iters = 5;
-    const double tolerance = 0.2;
+    const double tolerance = 0.05;


+            uint32_t cur_points = value_page_writer_.get_point_numer();
+            uint32_t page_remaining =
+                common::g_config_value_.page_writer_max_point_num_ - cur_points;
+            if (page_remaining == 0) {
+                if (RET_FAIL(seal_cur_page(false))) {
+                    return ret;
+                }
+                page_remaining =
+                    common::g_config_value_.page_writer_max_point_num_;
+            }


ColinLeeo force-pushed the colin_all_opts branch from a5ec198 to 804a1ec Compare May 26, 2026 02:41

ColinLeeo and others added 2 commits May 28, 2026 17:28

fix some code.

7db1a3a

ColinLeeo force-pushed the colin_all_opts branch from 69e1658 to 7db1a3a Compare May 28, 2026 09:28

ColinLeeo changed the title ~~TsFile C++ batch read/write optimization~~ C++: optimize batch read/write paths and aligned table null handling May 28, 2026

ColinLeeo added 15 commits May 28, 2026 18:26

fix msvc loss.

f39750d

disable QueryByRowPerformanceTest suite

153d412

Match develop: rename the fixture class to DISABLED_ so all four TEST_F cases (TreeModel/TableModel × Single/MultiSequence) are skipped by default. They're wall-clock perf benchmarks meant to be run on demand via --gtest_also_run_disabled_tests.

clang-format two test files

aee4386

Mechanical reformat after the previous restore commits; no behavior change.

ColinLeeo requested a review from Copilot June 2, 2026 07:43

Copilot started reviewing on behalf of ColinLeeo June 2, 2026 07:43 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++: optimize batch read/write paths and aligned table null handling#823

C++: optimize batch read/write paths and aligned table null handling#823
ColinLeeo wants to merge 17 commits into
developfrom
colin_all_opts

ColinLeeo commented May 26, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ColinLeeo commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Main Changes

Correctness Fixes

Compatibility Notes

Verification

Uh oh!

codecov-commenter commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ColinLeeo commented May 26, 2026 •

edited

Loading

codecov-commenter commented May 28, 2026 •

edited

Loading