Skip to content

[python] Fix filter on data evolution table not working issue#7211

Open
XiaoHongbo-Hope wants to merge 6 commits intoapache:masterfrom
XiaoHongbo-Hope:normal_filter_support
Open

[python] Fix filter on data evolution table not working issue#7211
XiaoHongbo-Hope wants to merge 6 commits intoapache:masterfrom
XiaoHongbo-Hope:normal_filter_support

Conversation

@XiaoHongbo-Hope
Copy link
Contributor

@XiaoHongbo-Hope XiaoHongbo-Hope commented Feb 4, 2026

Purpose/Problem

Filter not working on data evolution read: when a predicate is provided, all rows are returned.

Tests

API and Format

Documentation

@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review February 4, 2026 10:12
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft February 4, 2026 11:06
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review February 8, 2026 07:54
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft February 8, 2026 09:25
@XiaoHongbo-Hope XiaoHongbo-Hope force-pushed the normal_filter_support branch 2 times, most recently from 4354588 to b8e709d Compare February 10, 2026 11:33
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review February 10, 2026 14:12
simple_null = self._filter_batch_simple_null(batch)
if simple_null is not None:
return simple_null
if not self.predicate.has_null_check():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just use Predicate.to_arrow, it is same to other table modes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just use Predicate.to_arrow, it is same to other table modes.

Updated

schema=result.schema,
)
except (TypeError, ValueError, pa.ArrowInvalid) as e:
logger.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exception here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exception here?

Used Predicate.to_arrow. This code is removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can refactor this, this if should be in is_primary_key_table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can refactor this, this if should be in is_primary_key_table.

Updated

if not self.predicate:
return True
if self.predicate_for_stats is None:
if self.predicate_for_stats is None or self.data_evolution:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a separate if. And we should add comments to this if, explain why there is no filtering done here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a separate if. And we should add comments to this if, explain why there is no filtering done here.

Updated.

super().__init__(table, predicate, read_type, actual_split, row_tracking_enabled)

def _push_down_predicate(self) -> Optional[Predicate]:
# Do not push predicate to file readers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed comments, why not push predicate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed comments, why not push predicate.

Added

return ConcatBatchReader(suppliers)
merge_reader = ConcatBatchReader(suppliers)
if self.predicate is not None:
# Only apply filter when all predicate columns are in read_type (e.g. projected schema).
Copy link
Contributor

@JingsongLi JingsongLi Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we are returning here is complete row, right? So this check should be applicable to all table modes? That shouldn't be added here, it should be verified in SplitRead.init.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we are returning here is complete row, right? So this check should be applicable to all table modes? That shouldn't be added here, it should be verified in SplitRead.init.

Updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants