Skip to content

RUBY-3894 Fix heap buffer overflow in put_string and related methods#372

Merged
jamis merged 2 commits into
mongodb:masterfrom
jamis:3894-heap-buffer-overflow
Jun 8, 2026
Merged

RUBY-3894 Fix heap buffer overflow in put_string and related methods#372
jamis merged 2 commits into
mongodb:masterfrom
jamis:3894-heap-buffer-overflow

Conversation

@jamis

@jamis jamis commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Description

  • RSTRING_LEN returns long, but five sites in ext/bson/write.c stored the result in int32_t, silently truncating strings ≥ 2³¹ bytes to a negative value. That value was then passed to rb_bson_utf8_validate as size_t, wrapping to near UINT64_MAX and driving reads far past the heap allocation (confirmed heap-buffer-overflow with AddressSanitizer).
  • Fixed pvt_bson_encode_to_utf8, rb_bson_byte_buffer_put_string, rb_bson_byte_buffer_put_cstring, rb_bson_byte_buffer_put_symbol, and pvt_put_bson_key by changing the length variable to long and adding an explicit > INT32_MAX bounds check before narrowing back to int32_t for the BSON wire format. Strings exceeding the limit now raise ArgumentError.
  • Regression tests added in spec/bson/string_length_overflow_spec.rb, gated behind STRESS=1 (each example requires ~2 GB of free memory).

RSTRING_LEN returns long, but the native string encoder was storing the
result in int32_t. A string of exactly 2**31 bytes silently wraps to
INT32_MIN, which is then passed to rb_bson_utf8_validate as size_t,
wrapping to near UINT64_MAX and driving reads far past the heap.

Change the length variable to long in pvt_bson_encode_to_utf8,
rb_bson_byte_buffer_put_string, rb_bson_byte_buffer_put_cstring,
rb_bson_byte_buffer_put_symbol, and pvt_put_bson_key. Add an explicit
bounds check in each before the value is narrowed back to int32_t for
the BSON wire format (BSON strings are limited to INT32_MAX bytes by
spec). Strings exceeding that limit now raise ArgumentError.

Regression tests are in spec/bson/string_length_overflow_spec.rb,
gated behind STRESS=1 since each example requires ~2 GB of free memory.
@jamis jamis requested a review from a team as a code owner June 3, 2026 22:23
@jamis jamis requested review from comandeo-mongo and Copilot June 3, 2026 22:23

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a critical memory-safety issue in the BSON Ruby native extension where Ruby string lengths (RSTRING_LEN, a long) were truncated into int32_t, potentially wrapping into enormous size_t values and causing out-of-bounds reads during UTF-8 validation and write operations.

Changes:

  • Adds a centralized length guard (pvt_check_string_length) and switches length variables from int32_t to long in several string-writing paths, raising ArgumentError when oversized.
  • Narrows to int32_t only after validation to match BSON wire format constraints.
  • Adds stress-gated regression specs covering put_string, put_cstring, and put_symbol overflow protection.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
ext/bson/write.c Prevents int32 truncation of Ruby string lengths and adds explicit bounds checks before encoding/writing.
spec/bson/string_length_overflow_spec.rb Adds STRESS-gated regression coverage for oversized string/symbol inputs triggering the prior overflow path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ext/bson/write.c Outdated
The binary-string encoding path writes length + 5 bytes total (4-byte
int32 length prefix + payload + 1-byte null terminator). Allowing
length == INT32_MAX caused that addition to overflow signed 32-bit
arithmetic, which is undefined behavior in C and could in principle
allow a miscompiled bounds check to reintroduce the class of memory
safety issue this fix set out to close.
Comment thread ext/bson/write.c
VALUE utf8_string;
const char *str;
int32_t length;
long length;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to change this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because RSTRING_LEN returns long, not int32_t, so the result could (in extreme cases) be cast from long to a large negative int32_t, which could (potentially) cause problems.

@jamis jamis merged commit 85fecdd into mongodb:master Jun 8, 2026
48 of 51 checks passed
@jamis jamis deleted the 3894-heap-buffer-overflow branch June 8, 2026 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants