Skip to content

pluggable registry for input/export arrow kernels#7824

Open
a10y wants to merge 4 commits intodevelopfrom
aduffy/arrow-vtable
Open

pluggable registry for input/export arrow kernels#7824
a10y wants to merge 4 commits intodevelopfrom
aduffy/arrow-vtable

Conversation

@a10y
Copy link
Copy Markdown
Contributor

@a10y a10y commented May 7, 2026

Summary

Adds a pluggable registry on the session for Vortex extension <-> Arrow data/array round-tripping.

This unblocks round-trip serialization to Arrow of arrays with UUID, Tensor, and GeoArrow.

Part of #7686

API Changes

We introduce a new ArrowVTable, which is a new functionality trait that ExtVTables can implement if they wish to participate in Arrow execution.

/// A plugin that lets users plugin conversion between Vortex extension types and Arrow arrays
/// and data types.
pub trait ArrowVTable: 'static + Send + Sync + Debug {
    /// Vortex extension type ID that this plugin handles.
    fn vortex_ext_id(&self) -> ExtId;

    /// The name of the Vortex extension type handled by this plugin (e.g. `"arrow.uuid"`), if any.
    fn arrow_ext_name(&self) -> Option<&'static str> {
        None
    }

    /// Build the Arrow [`Field`] that represents `dtype` (which carries this plugin's
    /// extension metadata).
    fn to_arrow_field(
        &self,
        name: &str,
        dtype: &DType,
        session: &VortexSession,
    ) -> VortexResult<Field>;

    /// Build the Vortex [`DType`] that corresponds to `field` (which carries this plugin's
    /// Arrow extension metadata).
    fn from_arrow_field(&self, field: &Field) -> VortexResult<DType>;

    /// Convert a Vortex extension array into an Arrow array shaped to `target`.
    fn execute_arrow(
        &self,
        array: ArrayRef,
        target: &Field,
        ctx: &mut ExecutionCtx,
    ) -> VortexResult<ArrowArrayRef>;

    /// Convert an Arrow array (whose `field` carries this plugin's extension metadata)
    /// back into a Vortex array.
    fn from_arrow_array(&self, array: ArrowArrayRef, field: &Field) -> VortexResult<ArrayRef>;
}

It defines some functional methods to handle the case of converting to/from Arrow Field (need Field rather than DataType so we preserve the extension metadata) and for execution to a target physical Arrow type.

There are impls of the trait for

  • Uuid
  • Date
  • Time
  • Timestamp

Testing

Existing test suite. This is mostly just the API + initial implementations, this is only wired into the existing ArrayExecutor pathway but that should be deprecated, along with ArrayRef::from_arrow in favor of a new API that goes via the VortexSession, e.g. session.arrow().execute_arrow(..)

a10y added 2 commits May 6, 2026 22:54
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
@a10y a10y added the changelog/feature A new feature label May 7, 2026
a10y added 2 commits May 6, 2026 23:35
Signed-off-by: Andrew Duffy <[email protected]>
Signed-off-by: Andrew Duffy <[email protected]>
@a10y a10y requested a review from joseph-isaacs May 7, 2026 03:47
@a10y a10y mentioned this pull request May 7, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 7, 2026

Merging this PR will degrade performance by 21.65%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 24 improved benchmarks
❌ 27 regressed benchmarks
✅ 1155 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation bench_compare_primitive[(10000, 128)] 104.1 µs 119.4 µs -12.84%
Simulation bench_compare_primitive[(10000, 2)] 103.4 µs 118.1 µs -12.51%
Simulation bench_compare_primitive[(10000, 2048)] 126.9 µs 141.8 µs -10.51%
Simulation bench_compare_primitive[(10000, 32)] 103.5 µs 117.9 µs -12.18%
Simulation bench_compare_primitive[(10000, 4)] 102.8 µs 118.1 µs -12.95%
Simulation bench_compare_primitive[(10000, 512)] 113.4 µs 127.5 µs -11.02%
Simulation bench_compare_primitive[(10000, 8)] 103.1 µs 118.3 µs -12.9%
Simulation bench_compare_sliced_dict_primitive[(1000, 10000)] 77.2 µs 91.2 µs -15.35%
Simulation bench_compare_sliced_dict_primitive[(2000, 10000)] 82.4 µs 96 µs -14.13%
Simulation bench_compare_sliced_dict_primitive[(2500, 10000)] 84.9 µs 99 µs -14.27%
Simulation bench_compare_sliced_dict_primitive[(3333, 10000)] 89.7 µs 104.2 µs -13.85%
Simulation bench_compare_sliced_dict_primitive[(5000, 10000)] 98.5 µs 112.9 µs -12.79%
Simulation bench_compare_sliced_dict_varbinview[(1000, 10000)] 107.7 µs 123.7 µs -12.95%
Simulation bench_compare_varbin[(10000, 128)] 115.1 µs 129.8 µs -11.28%
Simulation bench_compare_varbin[(10000, 2)] 112.1 µs 126.8 µs -11.64%
Simulation bench_compare_varbin[(10000, 32)] 112.5 µs 127.7 µs -11.9%
Simulation bench_compare_varbin[(10000, 4)] 111.8 µs 126.8 µs -11.84%
Simulation bench_compare_varbin[(10000, 512)] 129.1 µs 143.8 µs -10.28%
Simulation bench_compare_varbin[(10000, 8)] 112.4 µs 127 µs -11.48%
Simulation bench_compare_varbinview[(10000, 128)] 114.7 µs 130.4 µs -12.03%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing aduffy/arrow-vtable (df3e436) with develop (eda8c2c)

Open in CodSpeed

@joseph-isaacs joseph-isaacs changed the title Aduffy/arrow vtable pluggable registry for input/export arrow kernels May 7, 2026
Comment on lines +45 to +47
fn arrow_ext_name(&self) -> Option<&'static str> {
None
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would I not put a name here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you have a one-way mapping from Vortex extension -> Arrow. For example Vortex's Date/Time/Timestamp types don't have an arrow extension. I think most implementations will fill this in FWIW

Comment on lines +125 to +128
/// Look up the plugin registered for the given Arrow extension name.
pub fn for_arrow_ext(&self, name: &str) -> Option<ArrowVTableRef> {
self.by_arrow_ext.find(&Id::new(name))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? Its much slower than by id? I would like to encourage that users hold a Id long term

Comment on lines +93 to +104
fn default() -> Self {
let this = Self {
by_vortex_ext: Registry::default(),
by_arrow_ext: Registry::default(),
};

// Builtin extension-type plugins. User registrations with the same ID will replace them.
this.register(Uuid);
this.register(Date);
this.register(Time);
this.register(Timestamp);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure there is only one exporter for each vortex type? I am not sure that is true?

I would think there is one per (vortex-type, arrow-type) We might have multiple arrow-types from a single vx one?

);
return Ok(arrow);
}
canonical_execute_arrow(array, target.map(Field::data_type), ctx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not clear if this current framework will help with these exporters. I think it should

Comment on lines +40 to +45
pub trait ArrowVTable: 'static + Send + Sync + Debug {
/// Vortex extension type ID that this plugin handles.
fn vortex_ext_id(&self) -> ExtId;

/// The name of the Vortex extension type handled by this plugin (e.g. `"arrow.uuid"`), if any.
fn arrow_ext_name(&self) -> Option<&'static str> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a general exporter for vortex <-> arrow or specific for extensions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants