-
Notifications
You must be signed in to change notification settings - Fork 289
Compare Schema
and StructType
fields irrespective of ordering
#700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -1730,19 +1730,17 @@ def test_move_nested_field_after_first(catalog: Catalog) -> None: | |||
with tbl.update_schema() as schema_update: | ||||
schema_update.move_before("struct.data", "struct.count") | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this makes me think that the Field ordering does matter... @Fokko wdyt There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First of all, thanks for digging into this 🎉 Technically the ordering does not matter when you write the data, because when reading we're correcting the order using this one: iceberg-python/pyiceberg/io/pyarrow.py Line 1143 in d02d7a1
Maybe we should also use that visitor when writing (instead of the PyArrow cast) introduced in #523 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. make sense, thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We're relying on pyarrow There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we want to do that. The |
||||
|
||||
assert str(tbl.schema()) == str( | ||||
Schema( | ||||
NestedField(field_id=1, name="id", field_type=LongType(), required=True), | ||||
NestedField( | ||||
field_id=2, | ||||
name="struct", | ||||
field_type=StructType( | ||||
NestedField(field_id=4, name="data", field_type=StringType(), required=True), | ||||
NestedField(field_id=3, name="count", field_type=LongType(), required=True), | ||||
), | ||||
required=True, | ||||
assert tbl.schema() == Schema( | ||||
NestedField(field_id=1, name="id", field_type=LongType(), required=True), | ||||
NestedField( | ||||
field_id=2, | ||||
name="struct", | ||||
field_type=StructType( | ||||
NestedField(field_id=4, name="data", field_type=StringType(), required=True), | ||||
NestedField(field_id=3, name="count", field_type=LongType(), required=True), | ||||
), | ||||
) | ||||
required=True, | ||||
), | ||||
) | ||||
|
||||
|
||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this change is semantically correct. This test is affected because
resolve_writer
compares the two given schemas (record_schema
andfile_schema
)iceberg-python/pyiceberg/avro/resolver.py
Lines 200 to 214 in 7bd5d9e
Previously, comparison returned
False
due to different ordering