Skip to content

GH-49959: [C++][Parquet] Avoid unbounded temp alloc in BYTE_STREAM_SPLIT decoder#49960

Open
pitrou wants to merge 1 commit into
apache:mainfrom
pitrou:gh49959-bss-temp-mem-alloc
Open

GH-49959: [C++][Parquet] Avoid unbounded temp alloc in BYTE_STREAM_SPLIT decoder#49960
pitrou wants to merge 1 commit into
apache:mainfrom
pitrou:gh49959-bss-temp-mem-alloc

Conversation

@pitrou
Copy link
Copy Markdown
Member

@pitrou pitrou commented May 11, 2026

Rationale for this change

The BYTE_STREAM_SPLIT encoder and decoder allocate a SmallVector to hold the addresses of the different streams. While this is valid for small number of streams (which is most or all legitimate use cases), very large numbers of streams (e.g. FLBA(100000000)) can trigger a huge temporary memory allocation.

This issue was found by OSS-Fuzz: https://issues.oss-fuzz.com/issues/511575321

What changes are included in this PR?

  1. When the width of the BYTE_STREAM_SPLIT-encoded type is larger than a predefined constant, switch to a slower implementation that doesn't need any temporary allocation
  2. When encoding or decoding 0 values, avoid some pointless setup work
  3. Add tests for the two conditions above

Are these changes tested?

Yes, by additional unit tests and by additional fuzz regression file.

Are there any user-facing changes?

@pitrou pitrou force-pushed the gh49959-bss-temp-mem-alloc branch from dd92425 to 14e16db Compare May 11, 2026 14:08
@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented May 11, 2026

@github-actions crossbow submit -g cpp

@github-actions
Copy link
Copy Markdown

Revision: 14e16db

Submitted crossbow builds: ursacomputing/crossbow @ actions-51f44d0cee

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-debian-13-cpp-amd64 GitHub Actions
test-debian-13-cpp-i386 GitHub Actions
test-debian-experimental-cpp-gcc-15 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou pitrou marked this pull request as ready for review May 11, 2026 14:52
@pitrou pitrou requested a review from wgtmac as a code owner May 11, 2026 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant