pluggable registry for input/export arrow kernels#7824
Conversation
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
Merging this PR will degrade performance by 21.65%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | bench_compare_primitive[(10000, 128)] |
104.1 µs | 119.4 µs | -12.84% |
| ❌ | Simulation | bench_compare_primitive[(10000, 2)] |
103.4 µs | 118.1 µs | -12.51% |
| ❌ | Simulation | bench_compare_primitive[(10000, 2048)] |
126.9 µs | 141.8 µs | -10.51% |
| ❌ | Simulation | bench_compare_primitive[(10000, 32)] |
103.5 µs | 117.9 µs | -12.18% |
| ❌ | Simulation | bench_compare_primitive[(10000, 4)] |
102.8 µs | 118.1 µs | -12.95% |
| ❌ | Simulation | bench_compare_primitive[(10000, 512)] |
113.4 µs | 127.5 µs | -11.02% |
| ❌ | Simulation | bench_compare_primitive[(10000, 8)] |
103.1 µs | 118.3 µs | -12.9% |
| ❌ | Simulation | bench_compare_sliced_dict_primitive[(1000, 10000)] |
77.2 µs | 91.2 µs | -15.35% |
| ❌ | Simulation | bench_compare_sliced_dict_primitive[(2000, 10000)] |
82.4 µs | 96 µs | -14.13% |
| ❌ | Simulation | bench_compare_sliced_dict_primitive[(2500, 10000)] |
84.9 µs | 99 µs | -14.27% |
| ❌ | Simulation | bench_compare_sliced_dict_primitive[(3333, 10000)] |
89.7 µs | 104.2 µs | -13.85% |
| ❌ | Simulation | bench_compare_sliced_dict_primitive[(5000, 10000)] |
98.5 µs | 112.9 µs | -12.79% |
| ❌ | Simulation | bench_compare_sliced_dict_varbinview[(1000, 10000)] |
107.7 µs | 123.7 µs | -12.95% |
| ❌ | Simulation | bench_compare_varbin[(10000, 128)] |
115.1 µs | 129.8 µs | -11.28% |
| ❌ | Simulation | bench_compare_varbin[(10000, 2)] |
112.1 µs | 126.8 µs | -11.64% |
| ❌ | Simulation | bench_compare_varbin[(10000, 32)] |
112.5 µs | 127.7 µs | -11.9% |
| ❌ | Simulation | bench_compare_varbin[(10000, 4)] |
111.8 µs | 126.8 µs | -11.84% |
| ❌ | Simulation | bench_compare_varbin[(10000, 512)] |
129.1 µs | 143.8 µs | -10.28% |
| ❌ | Simulation | bench_compare_varbin[(10000, 8)] |
112.4 µs | 127 µs | -11.48% |
| ❌ | Simulation | bench_compare_varbinview[(10000, 128)] |
114.7 µs | 130.4 µs | -12.03% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Comparing aduffy/arrow-vtable (df3e436) with develop (eda8c2c)
| fn arrow_ext_name(&self) -> Option<&'static str> { | ||
| None | ||
| } |
There was a problem hiding this comment.
when would I not put a name here?
There was a problem hiding this comment.
When you have a one-way mapping from Vortex extension -> Arrow. For example Vortex's Date/Time/Timestamp types don't have an arrow extension. I think most implementations will fill this in FWIW
| /// Look up the plugin registered for the given Arrow extension name. | ||
| pub fn for_arrow_ext(&self, name: &str) -> Option<ArrowVTableRef> { | ||
| self.by_arrow_ext.find(&Id::new(name)) | ||
| } |
There was a problem hiding this comment.
Do we need this? Its much slower than by id? I would like to encourage that users hold a Id long term
| fn default() -> Self { | ||
| let this = Self { | ||
| by_vortex_ext: Registry::default(), | ||
| by_arrow_ext: Registry::default(), | ||
| }; | ||
|
|
||
| // Builtin extension-type plugins. User registrations with the same ID will replace them. | ||
| this.register(Uuid); | ||
| this.register(Date); | ||
| this.register(Time); | ||
| this.register(Timestamp); | ||
|
|
There was a problem hiding this comment.
Are we sure there is only one exporter for each vortex type? I am not sure that is true?
I would think there is one per (vortex-type, arrow-type) We might have multiple arrow-types from a single vx one?
| ); | ||
| return Ok(arrow); | ||
| } | ||
| canonical_execute_arrow(array, target.map(Field::data_type), ctx) |
There was a problem hiding this comment.
Its not clear if this current framework will help with these exporters. I think it should
| pub trait ArrowVTable: 'static + Send + Sync + Debug { | ||
| /// Vortex extension type ID that this plugin handles. | ||
| fn vortex_ext_id(&self) -> ExtId; | ||
|
|
||
| /// The name of the Vortex extension type handled by this plugin (e.g. `"arrow.uuid"`), if any. | ||
| fn arrow_ext_name(&self) -> Option<&'static str> { |
There was a problem hiding this comment.
Is this a general exporter for vortex <-> arrow or specific for extensions?
Summary
Adds a pluggable registry on the session for Vortex extension <-> Arrow data/array round-tripping.
This unblocks round-trip serialization to Arrow of arrays with UUID, Tensor, and GeoArrow.
Part of #7686
API Changes
We introduce a new
ArrowVTable, which is a new functionality trait thatExtVTables can implement if they wish to participate in Arrow execution.It defines some functional methods to handle the case of converting to/from Arrow
Field(need Field rather than DataType so we preserve the extension metadata) and for execution to a target physical Arrow type.There are impls of the trait for
UuidDateTimeTimestampTesting
Existing test suite. This is mostly just the API + initial implementations, this is only wired into the existing
ArrayExecutorpathway but that should be deprecated, along withArrayRef::from_arrowin favor of a new API that goes via the VortexSession, e.g.session.arrow().execute_arrow(..)