Skip to content

Conversation

@tustvold
Copy link
Contributor

@tustvold tustvold commented Jan 2, 2026

Which issue does this PR close?

Rationale for this change

This trait is not meant to be overridden, and doing so will break many kernels in sometimes subtle ways.

What changes are included in this PR?

Seals the Array trait to prevent implementation outside of arrow-array.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 2, 2026
@tustvold tustvold added the api-change Changes to the arrow API label Jan 2, 2026
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tustvold @viirya and @Jefffrey

@alamb
Copy link
Contributor

alamb commented Jan 3, 2026

BTW while this is technically an API change and should wait for the next major release, I think it is important enough to merge for 57.2.0 (a minor release) and will not be disruptive (as I don't think anyone has implemented this)

I had a brief look around geoarrow as it is well designed in my opinion and a non trivial extension on top of Arrayss - https://github.com/geoarrow/geoarrow-rs/blob/main/rust/geoarrow-array/src/trait_.rs

I didn't see any impl Array but maybe @kylebarron / @paleolimbot could confirm that that they don't know of any potential downstream issues of making it impossible to impl Array.

Copy link
Member

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to implement Array for external types but was never able to get the downcasting working because of the closed nature of DataType. So I don't think it was previously possible to implement Array externally: sealing the array just makes it explicit

@gabotechs
Copy link

Such a same getting this trait sealed. I was finding it very convenient for implementing a GPU-based dyn Array baked by https://github.com/rapidsai/cudf in https://github.com/gabotechs/libcudf-rs.

@tustvold
Copy link
Contributor Author

tustvold commented Jan 5, 2026

I presume you're referring to https://github.com/gabotechs/libcudf-rs/blob/main/src/column_view.rs#L125

If so why not just define your own trait, it looks like you're only using a very limited subset of Array anyway, and passing that array into any arrow-rs kernel will cause it at best panic (as downcast won't work correctly).

@gabotechs
Copy link

@tustvold It's mainly for wiring it up with https://github.com/apache/datafusion. As DataFusion moves RecordBatches between nodes, which contain dynamic dyn Arrays, that was my means of transportation.

Not saying that the change in this PR does not make sense though, I believe it does, but I wonder what could be the alternative. Maybe letting DataFusion be the one that exposes a customizable trait for transporting data?

@scovich
Copy link
Contributor

scovich commented Jan 5, 2026

I tried to implement Array for external types but was never able to get the downcasting working because of the closed nature of DataType. So I don't think it was previously possible to implement Array externally: sealing the array just makes it explicit

When implementing the variant extension type, we also tried to useArray, but we hit the same 1:1 constraint -- every concrete Array must have a corresponding DataType enum variant in order for downcasting to work properly, and every DataType enum variant must have its own concrete type implementing Array for the same reason. In our case, the (proposed) Array::data_type method for VariantArray had to return DataType::Struct, but attempting to downcast it as StructArray would fail/panic.

So the DataType enum already logically seals the Array trait, but the compiler cannot enforce that. Actually sealing the trait just closes the compiler loophole that currently allows (always provably and unfixably incorrect) third-party implementations of the trait.

Aside: the same 1:1 constrait is also why Array cannot (and must never) require Any, even tho it makes casting a lot harder. Because &dyn Array and Arc<dyn Array> both impl Array (as does &Arc<Arc<dyn Array>> by transitivity) attempting any kind of casting via Array: Any would wreak havoc whenever those convenience wrapper impl are involved.

@scovich
Copy link
Contributor

scovich commented Jan 5, 2026

DataFusion moves RecordBatches between nodes, which contain dynamic dyn Arrays, that was my means of transportation.

Not saying that the change in this PR does not make sense though, I believe it does, but I wonder what could be the alternative. Maybe letting DataFusion be the one that exposes a customizable trait for transporting data?

Could it pass ArrayData instead? Should be easy enough to do e.g. StructArray::from(record_batch).into_data() and then RecordBatch::from(StructArray::from(array_data))?

@alamb
Copy link
Contributor

alamb commented Jan 6, 2026

I am expecting an issue shortly for this PR, and then I think we should merge it to avoid confusion.

I realize this will cause @gabotechs some pain downstream, but let's figure out a better API for backing arrays with GPU memory as a follow on

Copy link

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it pass ArrayData instead? Should be easy enough to do e.g. StructArray::from(record_batch).into_data() and then RecordBatch::from(StructArray::from(array_data))?

Yeah, I think I can do either this or something similar, thanks for the options! this change makes sense to me 👍

@alamb
Copy link
Contributor

alamb commented Jan 7, 2026

Thanks again for everyone's thoughts. Also thanks to @shinmao for the report

@alamb alamb merged commit 721f373 into apache:main Jan 7, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to the arrow API arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Soundness Bug in try_binary when Array is implemented incorrectly in external crate

8 participants