Skip to content

StatisticsConverter::row_group_null_counts incorrect for missing column #10926

@alamb

Description

@alamb

Describe the bug

I noticed this while working on #10852 with @marvinlanhenke

Basially, when generating statistics for a non existent column, the StatisticsExtractor will return a null array of the type of the column not a UInt64Array

Specifically

pub fn row_group_null_counts<I>(&self, metadatas: I) -> Result<ArrayRef>
where
I: IntoIterator<Item = &'a RowGroupMetaData>,
{
let data_type = self.arrow_field.data_type();
let Some(parquet_index) = self.parquet_index else {
return Ok(self.make_null_array(data_type, metadatas));
};
let null_counts = metadatas
.into_iter()
.map(|x| x.column(parquet_index).statistics())
.map(|s| s.map(|s| s.null_count()));
Ok(Arc::new(UInt64Array::from_iter(null_counts)))
}

The same problem exists for data_page_null_counts and data_page_row_counts (not for row_group_row_counts

To Reproduce

Try to call row_group_null_counts for a column that isn't in the parquet file

Expected behavior

  1. row_group_null_counts should always return an UInt64Array (not an ArrayRef)
  2. If there is not a column, the UInt64Array should be all nulls

Additional context

No response

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions