Skip to content

Conversation

@gmweaver
Copy link
Contributor

@gmweaver gmweaver commented Sep 24, 2025

Rationale for this change

Fix for #2057. It looks like the initial fix #1839 might have missed updating here to handle. I could use feedback on if this is the best fix, it is at least simple.

Are these changes tested?

  • added unit tests

Are there any user-facing changes?

No

@gmweaver gmweaver marked this pull request as ready for review September 24, 2025 22:26
desmondcheongzx pushed a commit to Eventual-Inc/Daft that referenced this pull request Sep 25, 2025
## Changes Made
Added changes to conditionally handle breaking changes in pyiceberg
0.10.0 for how to initialize `DataFile` and `Record` and constraints on
`name` for `Schema` fields and `PartitionField`.

I did not update dev dependencies at this point, because there are
actually still issues with Decimal. I have a
[PR](apache/iceberg-python#2515) open that
hopefully addresses this. To test, I had to temporarily change dev
dependencies to 0.10.0, and run tests without decimal type. I need to
update to 0.10.0 because support for anonymous was added (needed on my
side). Note sure what the best path is here considering that pyiceberg
releases on a much slower cadence.

## Related Issues

Closes #5223

## Checklist

- [ ] Documented in API Docs (if applicable)
- [ ] Documented in User Guide (if applicable)
- [ ] If adding a new documentation page, doc is added to
`docs/mkdocs.yml` navigation
- [ ] Documentation builds and is formatted properly (tag @/ccmao1130
for docs review)
@Fokko
Copy link
Contributor

Fokko commented Sep 25, 2025

Hey @gmweaver thanks for adding this. Could you also add a test for this? This ensures that we don't break it in the future.

@gmweaver
Copy link
Contributor Author

@Fokko test added and ensured I ran make lint, will try to remember this for future PRs so checks run successfully 😄

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isinstance(stats_col.iceberg_type, DecimalType) and statistics.physical_type != "FIXED_LEN_BYTE_ARRAY":
scale = stats_col.iceberg_type.scale
col_aggs[field_id].update_min(
unscaled_to_decimal(statistics.min_raw, scale)
) if statistics.min_raw is not None else None
col_aggs[field_id].update_max(
unscaled_to_decimal(statistics.max_raw, scale)
) if statistics.max_raw is not None else None
else:
col_aggs[field_id].update_min(statistics.min)
col_aggs[field_id].update_max(statistics.max)

can we simplify this logic too? can be a follow up PR

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks for fixing this issue.

@kevinjqliu kevinjqliu merged commit a51f2d3 into apache:main Oct 26, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants