You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This would be a high-performance data type for semi-structured data, designed for better OLAP performance than JSON or BSON (discussed in #7845).
While it is certainly possible to implement semi-structured, JSON and even Variant support today using the DataFusion extension apis (e.g. https://github.com/datafusion-contrib/datafusion-functions-json) this ticket tracks adding such support to DataFusion itself
Without Variant, customers had to choose between flexibility and performance. To maintain flexibility, customers would store JSON in single columns as strings. To see better performance, customers would apply strict schematizing approaches with structs, which requires separate processes to maintain and update with schema changes. With Variant, customers can retain flexibility (there's no need to define an explicit schema) and receive vastly improved performance compared to querying the JSON as a string.
Describe the solution you'd like
No response
Describe alternatives you've considered
This will be a big project. Here are some of the related pre-requisites
Uh oh!
There was an error while loading. Please reload this page.
Is your feature request related to a problem or challenge?
Processing semi-structured data (basically think anything that can be represented in JSON) efficiently is becoming more and more important.
As @wjones127 says in https://github.com/apache/datafusion/issues/10987>
While it is certainly possible to implement semi-structured, JSON and even Variant support today using the DataFusion extension apis (e.g. https://github.com/datafusion-contrib/datafusion-functions-json) this ticket tracks adding such support to DataFusion itself
Parquet recently adopted the Variant type : https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
We see adoption of this in other systems as well such as Iceberg and Spark.
I think DataBricks did a good job describing its rationale:
Describe the solution you'd like
No response
Describe alternatives you've considered
This will be a big project. Here are some of the related pre-requisites
TableProviders
#14993It is not clear to me if variant should be "built in" or if it should be an add on (for example, add a
variant
feature and adatafusion-variant
crate)Additional context
Related tickets
The text was updated successfully, but these errors were encountered: