Skip to content

feat(writer): emit SUM zone-map stat for numeric primitive columns#131

Merged
dfa1 merged 1 commit into
mainfrom
feat/zonemap-sum
Jun 21, 2026
Merged

feat(writer): emit SUM zone-map stat for numeric primitive columns#131
dfa1 merged 1 commit into
mainfrom
feat/zonemap-sum

Conversation

@dfa1

@dfa1 dfa1 commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Closes the SUM half of the Rust-parity zone-map stats increment (pairs with ADR 0013 §6 aggregate push-down). MIN/MAX/NULL_COUNT already shipped.

Ground truth

Probed Rust real-file fixtures (tpch_lineitem.compact.vortex): numeric primitive + decimal columns carry [MAX, MIN, SUM, NULL_COUNT]; Utf8/extension/date carry [MAX, MIN, NULL_COUNT]no SUM, even when extension storage is numeric. Java now matches.

Changes

  • PrimitiveEncodingEncoder.sumStat — per-chunk sum over logical values: signed → i64, unsigned → u64, float → f64; checked i64/u64 overflow → null zone (Rust drops it). Validity placeholders are zero (sum-neutral), so nullable columns sum correctly without excluding nulls.
  • VortexWriter — emit SUM(5) in the zone-map stats table for plain numeric primitives only (not extension/utf8/decimal), flat + dict paths alike, carried on ChunkRef/DictColRef. SUM is independent of MIN/MAX (a partial-stats column drops MIN/MAX but keeps SUM).

Bitsets

Numeric primitive: 0x580x78 (+SUM). Partial-stats numeric: 0x400x60. Utf8/extension unchanged.

Full ./mvnw verify green incl. Java-writes→Rust-reads JNI interop + inspector decode + javadoc. WriterZoneMapTest now asserts per-zone sums (flat, nullable null-skipping, dict).

🤖 Generated with Claude Code

Probed Rust real-file fixtures (tpch_lineitem): numeric primitive and
decimal columns carry [MAX, MIN, SUM, NULL_COUNT]; Utf8/extension/date
carry [MAX, MIN, NULL_COUNT] (no SUM, even when extension storage is
numeric). Closes the SUM half of the Rust-parity stats increment (ADR
0013 §6 aggregate push-down).

The writer now computes a per-chunk SUM over each column's logical values
(PrimitiveEncodingEncoder.sumStat: signed -> i64, unsigned -> u64, float
-> f64; checked i64/u64 overflow -> null zone) and emits it as the SUM(5)
field in the zone-map stats table for plain numeric primitives only —
flat and dict paths alike, carried on ChunkRef / DictColRef. Validity
placeholders are zero (sum-neutral), so nullable columns sum correctly
without excluding nulls. Decimal SUM is deferred with decimal min/max.

Co-Authored-By: Claude Opus 4.8 <[email protected]>
@dfa1 dfa1 merged commit 9661f55 into main Jun 21, 2026
6 checks passed
@dfa1 dfa1 deleted the feat/zonemap-sum branch June 21, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant