fix checking physical type for Decimal type in StatsAggregator #2515

gmweaver · 2025-09-24T22:25:11Z

Rationale for this change

Fix for #2057. It looks like the initial fix #1839 might have missed updating here to handle. I could use feedback on if this is the best fix, it is at least simple.

Are these changes tested?

added unit tests

Are there any user-facing changes?

No

…based on size

## Changes Made Added changes to conditionally handle breaking changes in pyiceberg 0.10.0 for how to initialize `DataFile` and `Record` and constraints on `name` for `Schema` fields and `PartitionField`. I did not update dev dependencies at this point, because there are actually still issues with Decimal. I have a [PR](apache/iceberg-python#2515) open that hopefully addresses this. To test, I had to temporarily change dev dependencies to 0.10.0, and run tests without decimal type. I need to update to 0.10.0 because support for anonymous was added (needed on my side). Note sure what the best path is here considering that pyiceberg releases on a much slower cadence. ## Related Issues Closes #5223 ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

Fokko · 2025-09-25T12:28:37Z

Hey @gmweaver thanks for adding this. Could you also add a test for this? This ensures that we don't break it in the future.

gmweaver · 2025-09-25T16:59:13Z

@Fokko test added and ensured I ran make lint, will try to remember this for future PRs so checks run successfully 😄

kevinjqliu

iceberg-python/pyiceberg/io/pyarrow.py

Lines 2507 to 2518 in e5e7453

    
           if isinstance(stats_col.iceberg_type, DecimalType) and statistics.physical_type != "FIXED_LEN_BYTE_ARRAY": 
        
               scale = stats_col.iceberg_type.scale 
        
               col_aggs[field_id].update_min( 
        
                   unscaled_to_decimal(statistics.min_raw, scale) 
        
               ) if statistics.min_raw is not None else None 
        
               col_aggs[field_id].update_max( 
        
                   unscaled_to_decimal(statistics.max_raw, scale) 
        
               ) if statistics.max_raw is not None else None 
        
           else: 
        
               col_aggs[field_id].update_min(statistics.min) 
        
               col_aggs[field_id].update_max(statistics.max)

can we simplify this logic too? can be a follow up PR

pyiceberg/io/pyarrow.py

tests/io/test_pyarrow.py

kevinjqliu

LGTM Thanks for fixing this issue.

pyiceberg/io/pyarrow.py

fix checking physical type for Decimal to handle INT32/INT64 storage …

2de947a

…based on size

gmweaver marked this pull request as ready for review September 24, 2025 22:26

gmweaver mentioned this pull request Sep 24, 2025

feat: add support for pyiceberg 0.10.0 Eventual-Inc/Daft#5277

Merged

4 tasks

add test

a30b7e9

gmweaver mentioned this pull request Sep 25, 2025

Compatibility with PyIceberg v0.10 Eventual-Inc/Daft#5223

Closed

kevinjqliu reviewed Sep 26, 2025

View reviewed changes

pyiceberg/io/pyarrow.py Show resolved Hide resolved

kevinjqliu reviewed Sep 26, 2025

View reviewed changes

tests/io/test_pyarrow.py Show resolved Hide resolved

kevinjqliu approved these changes Sep 26, 2025

View reviewed changes

pyiceberg/io/pyarrow.py Show resolved Hide resolved

gmweaver added 3 commits September 26, 2025 14:43

comments

fe09b2f

add TODO

fa6ae29

fix comment

ccb1108

kevinjqliu merged commit a51f2d3 into apache:main Oct 26, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix checking physical type for Decimal type in StatsAggregator #2515

fix checking physical type for Decimal type in StatsAggregator #2515

Uh oh!

gmweaver commented Sep 24, 2025 •

edited

Loading

Uh oh!

Fokko commented Sep 25, 2025

Uh oh!

gmweaver commented Sep 25, 2025

Uh oh!

kevinjqliu left a comment

Uh oh!

Uh oh!

Uh oh!

kevinjqliu left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if isinstance(stats_col.iceberg_type, DecimalType) and statistics.physical_type != "FIXED_LEN_BYTE_ARRAY":
	scale = stats_col.iceberg_type.scale
	col_aggs[field_id].update_min(
	unscaled_to_decimal(statistics.min_raw, scale)
	) if statistics.min_raw is not None else None
	col_aggs[field_id].update_max(
	unscaled_to_decimal(statistics.max_raw, scale)
	) if statistics.max_raw is not None else None
	else:
	col_aggs[field_id].update_min(statistics.min)
	col_aggs[field_id].update_max(statistics.max)

fix checking physical type for Decimal type in StatsAggregator #2515

fix checking physical type for Decimal type in StatsAggregator #2515

Uh oh!

Conversation

gmweaver commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Fokko commented Sep 25, 2025

Uh oh!

gmweaver commented Sep 25, 2025

Uh oh!

kevinjqliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kevinjqliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gmweaver commented Sep 24, 2025 •

edited

Loading