Skip to content

Change FieldSummary {upper,lower}_bound to ByteBuf #1369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented May 22, 2025

Which issue does this PR close?

I would like to invite everyone to roast my Rust-skills in order for me to improve myself :)

Unblocks #1328

This aligns closely with PyIceberg and Java and greatly simplifies the use of the Avro readers in PyIceberg. Otherwise, we would need to update public APIs.

What changes are included in this PR?

Are these changes tested?

@Fokko Fokko force-pushed the fd-change-datum-into-bytes branch 2 times, most recently from 62dcdd3 to 601ef29 Compare May 22, 2025 12:24
@Fokko Fokko force-pushed the fd-change-datum-into-bytes branch from 601ef29 to 1da092a Compare May 22, 2025 12:26
@Fokko Fokko force-pushed the fd-change-datum-into-bytes branch 2 times, most recently from bd4691f to d215831 Compare May 22, 2025 16:31
@Fokko Fokko force-pushed the fd-change-datum-into-bytes branch from d215831 to ccd4632 Compare May 22, 2025 20:50
Some(bound) if datum <= bound => ROWS_CANNOT_MATCH,
Some(_) => ROWS_MIGHT_MATCH,
Some(bound_bytes) => {
let bound = ManifestFilterVisitor::bytes_to_datum(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liurenjie1024 what's the cost of datum_from_bytes and bytes_to_datum? Do we need to introduce a new type in between?

@@ -577,7 +575,7 @@ pub struct ManifestFile {
/// A list of field summaries for each partition field in the spec. Each
/// field in the list corresponds to a field in the manifest file’s
/// partition spec.
pub partitions: Vec<FieldSummary>,
pub partitions: Option<Vec<FieldSummary>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this change, it aligns with the spec definition

Screenshot 2025-05-23 at 8 43 56 PM

Ok(Vec::new())
}
}

impl ManifestFileV2 {
/// Converts the [ManifestFileV2] into a [ManifestFile].
/// The convert of [partitions] need the partition_type info so use this function instead of [std::TryFrom] trait.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partition_type is removed

Suggested change
/// The convert of [partitions] need the partition_type info so use this function instead of [std::TryFrom] trait.

@@ -414,6 +444,11 @@ impl ManifestFilterVisitor<'_> {
};
Ok(bound)
}

fn bytes_to_datum(bytes: &ByteBuf, t: Type) -> Datum {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit should this be in Datum alongside try_from_bytes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants