-
Notifications
You must be signed in to change notification settings - Fork 357
Avro: Fix tests and add missing content
header
#2265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pyiceberg/manifest.py
Outdated
@@ -1279,6 +1279,7 @@ def __init__( | |||
"parent-snapshot-id": str(parent_snapshot_id) if parent_snapshot_id is not None else "null", | |||
"sequence-number": str(sequence_number), | |||
"format-version": "2", | |||
"content": "data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This header is missing, and should be set to data
until we support MoR deletes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah interesting. Looks like currently the ManifestWriterV2
appends the "content": "data",
iceberg-python/pyiceberg/manifest.py
Line 1165 in 904c0b7
"content": "data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we remove the "content": "data",
from ManifestWriterV2
? The content
field is part of the manifest list, not the manifest file
While working on apache#2004 I've noticed some small discrepancies that I think would be good to address in a separate PR.
d15ec47
to
382a548
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool! i was able to use apache/iceberg-rust#1328 to read both the manifest file and the v2 manifest list file
pyiceberg/manifest.py
Outdated
@@ -1279,6 +1279,7 @@ def __init__( | |||
"parent-snapshot-id": str(parent_snapshot_id) if parent_snapshot_id is not None else "null", | |||
"sequence-number": str(sequence_number), | |||
"format-version": "2", | |||
"content": "data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah interesting. Looks like currently the ManifestWriterV2
appends the "content": "data",
iceberg-python/pyiceberg/manifest.py
Line 1165 in 904c0b7
"content": "data", |
pyiceberg/manifest.py
Outdated
@@ -1279,6 +1279,7 @@ def __init__( | |||
"parent-snapshot-id": str(parent_snapshot_id) if parent_snapshot_id is not None else "null", | |||
"sequence-number": str(sequence_number), | |||
"format-version": "2", | |||
"content": "data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e980caa
to
306032b
Compare
306032b
to
63c4504
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I tested with apache/iceberg-rust#1328. I had to pad the metric values to be exactly 8 bytes (and pushed the change)
I tested both the manifest file fixture (generated_manifest_entry_file
) and the v2 manifest list fixture (generated_manifest_file_file_v2
)
The v1 manifest list fixture (generated_manifest_file_file_v1
) failed with
Source: Failed to deserialize Avro value into value: missing field `content`
which is a known issue tracked in apache/iceberg-rust#1576
I left a small nit for the content
field currently in ManifestWriterV2
pyiceberg/manifest.py
Outdated
@@ -1279,6 +1279,7 @@ def __init__( | |||
"parent-snapshot-id": str(parent_snapshot_id) if parent_snapshot_id is not None else "null", | |||
"sequence-number": str(sequence_number), | |||
"format-version": "2", | |||
"content": "data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we remove the "content": "data",
from ManifestWriterV2
? The content
field is part of the manifest list, not the manifest file
… into fd-fix-small-things
Thanks @kevinjqliu 🙌 |
While working on apache#2004 I've noticed some small discrepancies that I think would be good to address in a separate PR. <!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Kevin Liu <[email protected]>
While working on #2004 I've noticed some small discrepancies that I think would be good to address in a separate PR.
Rationale for this change
Are these changes tested?
Are there any user-facing changes?