-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I noticed this while working on #10852 with @marvinlanhenke
Basially, when generating statistics for a non existent column, the StatisticsExtractor will return a null array of the type of the column not a UInt64Array
Specifically
datafusion/datafusion/core/src/datasource/physical_plan/parquet/statistics.rs
Lines 871 to 886 in 2f43476
pub fn row_group_null_counts<I>(&self, metadatas: I) -> Result<ArrayRef> | |
where | |
I: IntoIterator<Item = &'a RowGroupMetaData>, | |
{ | |
let data_type = self.arrow_field.data_type(); | |
let Some(parquet_index) = self.parquet_index else { | |
return Ok(self.make_null_array(data_type, metadatas)); | |
}; | |
let null_counts = metadatas | |
.into_iter() | |
.map(|x| x.column(parquet_index).statistics()) | |
.map(|s| s.map(|s| s.null_count())); | |
Ok(Arc::new(UInt64Array::from_iter(null_counts))) | |
} |
The same problem exists for data_page_null_counts
and data_page_row_counts
(not for row_group_row_counts
To Reproduce
Try to call row_group_null_counts for a column that isn't in the parquet file
Expected behavior
- row_group_null_counts should always return an
UInt64Array
(not anArrayRef
) - If there is not a column, the UInt64Array should be all nulls
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working