Skip to content

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Jul 31, 2025

While working on #2004 I've noticed some small discrepancies that I think would be good to address in a separate PR.

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

@@ -1279,6 +1279,7 @@ def __init__(
"parent-snapshot-id": str(parent_snapshot_id) if parent_snapshot_id is not None else "null",
"sequence-number": str(sequence_number),
"format-version": "2",
"content": "data",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header is missing, and should be set to data until we support MoR deletes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah interesting. Looks like currently the ManifestWriterV2 appends the "content": "data",

"content": "data",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we remove the "content": "data", from ManifestWriterV2 ? The content field is part of the manifest list, not the manifest file

While working on apache#2004 I've noticed some small discrepancies
that I think would be good to address in a separate PR.
@Fokko Fokko force-pushed the fd-fix-small-things branch from d15ec47 to 382a548 Compare July 31, 2025 19:59
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! i was able to use apache/iceberg-rust#1328 to read both the manifest file and the v2 manifest list file

@@ -1279,6 +1279,7 @@ def __init__(
"parent-snapshot-id": str(parent_snapshot_id) if parent_snapshot_id is not None else "null",
"sequence-number": str(sequence_number),
"format-version": "2",
"content": "data",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah interesting. Looks like currently the ManifestWriterV2 appends the "content": "data",

"content": "data",

@@ -1279,6 +1279,7 @@ def __init__(
"parent-snapshot-id": str(parent_snapshot_id) if parent_snapshot_id is not None else "null",
"sequence-number": str(sequence_number),
"format-version": "2",
"content": "data",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko Fokko force-pushed the fd-fix-small-things branch from e980caa to 306032b Compare August 3, 2025 19:13
@Fokko Fokko force-pushed the fd-fix-small-things branch from 306032b to 63c4504 Compare August 3, 2025 21:13
@Fokko Fokko mentioned this pull request Aug 3, 2025
@Fokko Fokko added this to the PyIceberg 0.10.0 milestone Aug 4, 2025
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I tested with apache/iceberg-rust#1328. I had to pad the metric values to be exactly 8 bytes (and pushed the change)

I tested both the manifest file fixture (generated_manifest_entry_file) and the v2 manifest list fixture (generated_manifest_file_file_v2)
The v1 manifest list fixture (generated_manifest_file_file_v1) failed with

Source: Failed to deserialize Avro value into value: missing field `content`

which is a known issue tracked in apache/iceberg-rust#1576

I left a small nit for the content field currently in ManifestWriterV2

@@ -1279,6 +1279,7 @@ def __init__(
"parent-snapshot-id": str(parent_snapshot_id) if parent_snapshot_id is not None else "null",
"sequence-number": str(sequence_number),
"format-version": "2",
"content": "data",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we remove the "content": "data", from ManifestWriterV2 ? The content field is part of the manifest list, not the manifest file

@Fokko Fokko merged commit 14ee8da into apache:main Aug 4, 2025
10 checks passed
@Fokko Fokko deleted the fd-fix-small-things branch August 4, 2025 18:39
@Fokko
Copy link
Contributor Author

Fokko commented Aug 4, 2025

Thanks @kevinjqliu 🙌

gabeiglio pushed a commit to Netflix/iceberg-python that referenced this pull request Aug 13, 2025
While working on apache#2004 I've
noticed some small discrepancies that I think would be good to address
in a separate PR.

<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change

# Are these changes tested?

# Are there any user-facing changes?

<!-- In the case of user-facing changes, please add the changelog label.
-->

---------

Co-authored-by: Kevin Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants