Conversation
1c091b1 to
c8699a6
Compare
Merging this PR will degrade performance by 37.77%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
31.3 µs | 50.2 µs | -37.77% |
| ⚡ | Simulation | take_search[(0.005, 0.05)] |
168 µs | 131.4 µs | +27.88% |
| ⚡ | Simulation | take_search[(0.005, 0.1)] |
320.1 µs | 247.1 µs | +29.56% |
| ⚡ | Simulation | take_search[(0.005, 0.5)] |
1.5 ms | 1.2 ms | +31.04% |
| ⚡ | Simulation | take_search[(0.005, 1.0)] |
3.1 ms | 2.3 ms | +31.26% |
| ⚡ | Simulation | take_search[(0.01, 0.05)] |
179.1 µs | 142.5 µs | +25.71% |
| ⚡ | Simulation | take_search[(0.01, 0.1)] |
341 µs | 267.9 µs | +27.26% |
| ⚡ | Simulation | take_search[(0.01, 0.5)] |
1.6 ms | 1.3 ms | +28.58% |
| ⚡ | Simulation | take_search[(0.01, 1.0)] |
3.3 ms | 2.5 ms | +28.78% |
| ⚡ | Simulation | take_search[(0.1, 0.05)] |
249 µs | 212.4 µs | +17.25% |
| ⚡ | Simulation | take_search[(0.1, 0.1)] |
459.3 µs | 386.2 µs | +18.91% |
| ⚡ | Simulation | take_search[(0.1, 0.5)] |
2.2 ms | 1.8 ms | +20.4% |
| ⚡ | Simulation | take_search[(0.1, 1.0)] |
4.3 ms | 3.5 ms | +20.62% |
| ⚡ | Simulation | take_search_chunked[(0.005, 0.05)] |
193.4 µs | 162.1 µs | +19.28% |
| ⚡ | Simulation | take_search_chunked[(0.005, 0.1)] |
369.2 µs | 307 µs | +20.26% |
| ⚡ | Simulation | take_search_chunked[(0.005, 0.5)] |
1.8 ms | 1.5 ms | +21.07% |
| ⚡ | Simulation | take_search_chunked[(0.005, 1.0)] |
3.5 ms | 2.9 ms | +21.2% |
| ⚡ | Simulation | take_search_chunked[(0.01, 0.05)] |
205.9 µs | 174.6 µs | +17.9% |
| ⚡ | Simulation | take_search_chunked[(0.01, 0.1)] |
394 µs | 331.8 µs | +18.75% |
| ⚡ | Simulation | take_search_chunked[(0.01, 0.5)] |
1.9 ms | 1.6 ms | +19.5% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Comparing adamg/aggregate-fn-array-uncompressed (b7fb3d0) with develop (6331959)
Polar Signals Profiling ResultsLatest Run
Previous Runs (2)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.033x ➖ datafusion / vortex-file-compressed (1.033x ➖, 0↑ 1↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.015x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.991x ➖, 0↑ 1↓)
datafusion / parquet (0.954x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.944x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.974x ➖, 1↑ 1↓)
duckdb / parquet (0.974x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.897x ✅, 10↑ 0↓)
datafusion / vortex-compact (0.904x ➖, 9↑ 0↓)
datafusion / parquet (0.933x ➖, 6↑ 0↓)
datafusion / arrow (0.883x ✅, 12↑ 0↓)
duckdb / vortex-file-compressed (0.907x ➖, 7↑ 0↓)
duckdb / vortex-compact (0.921x ➖, 7↑ 0↓)
duckdb / parquet (0.953x ➖, 3↑ 1↓)
duckdb / duckdb (0.923x ➖, 4↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.109x ❌, 0↑ 56↓)
datafusion / vortex-compact (1.095x ➖, 0↑ 47↓)
datafusion / parquet (1.089x ➖, 0↑ 39↓)
duckdb / vortex-file-compressed (1.092x ➖, 1↑ 44↓)
duckdb / vortex-compact (1.077x ➖, 0↑ 28↓)
duckdb / parquet (1.064x ➖, 0↑ 22↓)
duckdb / duckdb (1.090x ➖, 0↑ 38↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.995x ➖, 0↑ 1↓)
datafusion / vortex-compact (1.073x ➖, 1↑ 2↓)
datafusion / parquet (1.086x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.070x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.053x ➖, 0↑ 1↓)
duckdb / parquet (1.006x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Random AccessVortex (geomean): 0.967x ➖ unknown / unknown (0.990x ➖, 3↑ 0↓)
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (1.010x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.004x ➖, 0↑ 0↓)
duckdb / parquet (1.010x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.997x ➖, 1↑ 0↓)
datafusion / parquet (1.002x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.014x ➖, 1↑ 2↓)
duckdb / parquet (0.990x ➖, 1↑ 0↓)
duckdb / duckdb (0.985x ➖, 1↑ 0↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.089x ➖, 0↑ 4↓)
datafusion / vortex-compact (1.057x ➖, 0↑ 3↓)
datafusion / parquet (1.115x ➖, 0↑ 5↓)
duckdb / vortex-file-compressed (1.029x ➖, 0↑ 2↓)
duckdb / vortex-compact (1.103x ➖, 0↑ 2↓)
duckdb / parquet (1.024x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 1.002x ➖ unknown / unknown (0.982x ➖, 10↑ 3↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.165x ➖, 1↑ 6↓)
datafusion / vortex-compact (1.162x ➖, 0↑ 5↓)
datafusion / parquet (1.033x ➖, 1↑ 1↓)
duckdb / vortex-file-compressed (1.127x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.026x ➖, 0↑ 0↓)
duckdb / parquet (1.073x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.011x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.009x ➖, 0↑ 0↓)
datafusion / parquet (1.018x ➖, 0↑ 0↓)
datafusion / arrow (1.011x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.001x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.010x ➖, 0↑ 0↓)
duckdb / parquet (1.003x ➖, 0↑ 0↓)
duckdb / duckdb (1.001x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
| /// Applies to all types and returns a non-null `u64`. Encoding kernels can return this aggregate | ||
| /// directly from metadata to avoid decoding arrays whose uncompressed size is known. | ||
| #[derive(Clone, Debug)] | ||
| pub struct UncompressedSizeInBytes; |
There was a problem hiding this comment.
So this is only buffers of the array and all children.
Are we sure we should not include metadata?
Maybe just justify why?
There was a problem hiding this comment.
I don't think we do now, and it should really be a rounding error right?
It also seems to be more of an implementation details and not the number execution engines care about.
There was a problem hiding this comment.
You need a more precise definition of this value. I guess its sum of buffers in the fully canonical variant?
There was a problem hiding this comment.
Let's define it as byte size of buffers and children in canonical representation. Can you add a note?
0e19932 to
2ac1bd9
Compare
robert3005
left a comment
There was a problem hiding this comment.
Constant impl needs changing, this reads a bit too much like LLM generated code
2ac1bd9 to
7cb7c12
Compare
Signed-off-by: Adam Gutglick <[email protected]>
Signed-off-by: Adam Gutglick <[email protected]>
Signed-off-by: Adam Gutglick <[email protected]>
Signed-off-by: Adam Gutglick <[email protected]>
7cb7c12 to
673cf82
Compare
Signed-off-by: Adam Gutglick <[email protected]>
| #[cfg(test)] | ||
| use crate::builders::builder_with_capacity; |
There was a problem hiding this comment.
can you move this to the tests module?
| #[cfg(test)] | ||
| fn materialized_uncompressed_size_in_bytes(array: &ArrayRef) -> u64 { | ||
| let mut builder = builder_with_capacity(array.dtype(), array.len()); | ||
| unsafe { | ||
| builder.extend_from_array_unchecked(array); | ||
| } | ||
| builder.finish().nbytes() | ||
| } |
Summary
This stat turns out to be very useful for helping execution engines plan joins better, but its currently requires a full-decompression. This PR includes the basic structure for a new AggregateFn, including implementation for all canonical types.