feat: Add basic operations for `UpdateSchema` #1172

jonathanc-n · 2025-04-07T02:45:56Z

Which issue does this PR close?

first part of Add SchemaUpdate logic to Iceberg-Rust #697

What changes are included in this PR?

Added basic functionality to UpdateSchema. Wanted to split it up in two parts.

Are these changes tested?

…g-rust into schema-update

jonathanc-n · 2025-05-07T17:42:55Z

@Fokko @Xuanwo @liurenjie1024 This should be ready for review

CTTY

Hi @jonathanc-n , thanks for the work! I've left some comments. Also there are some implementations missing:

functions like move and union_schema
apply logic to commit changes to the schema
Do you plan to address them in this PR?

CTTY · 2025-05-15T17:09:26Z

crates/iceberg/src/spec/schema/mod.rs

@@ -21,6 +21,7 @@ use std::collections::{HashMap, HashSet};
 use std::fmt::{Display, Formatter};
 use std::sync::Arc;

+mod update;


we should probably name it update_schema.rs avoid confusion

I think it is best practice to infer this from the folder name. For example codebases such as Datafusion or iceberg-rust, files named metadata are just called metadata.rs under different folders (ex. manifest, puffin, etc.) Unless it becomes ambiguous with other names in the same folder

I'm very new to rust naming convention, thanks for the context! In this case, probably schema.rs is a better name?

I'm adding a new file update_statistics.rs under the same folder in this PR: #1359 I'll probably rename it to statistics.rs, wdyt?

No the folder gives context, so schema/update.rs -> updating schema. with transaction/update_statistics, there is no context given to what the file is doing if it is called statistics.rs, so update_statistics is fine.

Understood, thanks for the explanation!

CTTY · 2025-05-15T17:24:13Z

crates/iceberg/src/spec/schema/update.rs

+    /// This method returns a reference to `Self` to allow for method chaining.
+    fn add_column(
+        &mut self,
+        column_name: Vec<String>,


is there any considerations that we do not want to take the column name string and then find the parent via schema like iceberg-java?

I did notice that iceberg-python followed this pattern, and would love to understand the context

I believe to avoid having dot in names being a problem, not sure if there is another reason though.

crates/iceberg/src/spec/schema/update.rs

CTTY · 2025-05-15T17:56:00Z

crates/iceberg/src/spec/schema/update.rs

+        column_name: Vec<String>,
+        field_type: Type,
+        doc: Option<String>,
+        required: bool,


There is a recent PR to support default values in UpdateSchema, it would be good to port that to iceberg-rs as well: apache/iceberg@602c35a

I think this would also be nice in a follow up pull request. I created an issue for that here.

CTTY · 2025-05-15T18:11:12Z

crates/iceberg/src/spec/schema/update.rs

+    /// # Returns
+    ///
+    /// An empty Ok(()) on success.
+    pub fn set_column_requirement(


I think it would be better to make this private and add APIs like make_column_optional to call it

I think just changing the function name would be fine? what do you think?

I'm thinking of something like

pub fn require_column(col_name) { set_column_requirement(col_name, true) } pub fn make_column_optional(col_name) {set_column_requirement(col_name, false)} fn set_column_requirement { // this function }

CTTY · 2025-05-15T18:15:00Z

crates/iceberg/src/spec/schema/update.rs

+
+#[allow(dead_code)]
+#[derive(Debug)]
+pub struct Move {


it seems like move related functions are not implemented yet?

Yes this will be implemented in a follow up pull request.

…g-rust into schema-update

hsingh574 · 2025-05-28T00:36:34Z

Are unit tests being tracked separately?

hsingh574 · 2025-05-28T00:54:46Z

crates/iceberg/src/spec/schema/update.rs

+            if !self.deletes.contains(&existing_field.id) {
+                return Err(Error::new(
+                    crate::ErrorKind::DataInvalid,
+                    format!("Cannot add column {}, to non-struct type.", name),


[nit] wrong error message

hsingh574 · 2025-05-28T00:59:21Z

crates/iceberg/src/spec/schema/update.rs

+                _ => parent_field.clone(),
+            };
+
+            if !parent_type.is_struct() {


Are there any other invariants to check about parent? It shouldn't be a part of self.deletes for example.

hsingh574 · 2025-05-28T01:01:49Z

crates/iceberg/src/spec/schema/update.rs

+            }
+        }
+
+        if parent.is_empty() {


I think I'm probably misunderstanding something, don't we want to search for parent if its not empty?

cmcarthur · 2025-06-24T11:42:07Z

hey @jonathanc-n -- very interested to see this get accepted and merged. anything i can do to help get this over the line?

jonathanc-n added 7 commits April 6, 2025 22:44

feat: Add basic APIs for UpdateSchema

5226602

Merge branch 'main' into schema-update

56c6656

clippy

87cc5a7

Merge branch 'schema-update' of https://github.com/jonathanc-n/iceber…

68e5f44

…g-rust into schema-update

fmt

4cb500f

Merge branch 'main' into schema-update

0e4bdf0

fix

c517a66

Merge branch 'main' into schema-update

fc0091b

jonathanc-n mentioned this pull request May 15, 2025

feat: Add schema update support for Transaction API #1333

Closed

CTTY reviewed May 15, 2025

View reviewed changes

jonathanc-n mentioned this pull request May 16, 2025

[EPIC] Transaction Support Issues and Pull Requests #1339

Open

17 tasks

Merge branch 'main' into schema-update

141b071

jonathanc-n mentioned this pull request May 20, 2025

feat: Support default values in UpdateSchema #1357

Open

jonathanc-n added 5 commits May 20, 2025 17:27

Merge branch 'main' into schema-update

85e2ce7

minor fixes

ca7a13e

Merge branch 'schema-update' of https://github.com/jonathanc-n/iceber…

2fdcfbe

…g-rust into schema-update

Merge branch 'main' into schema-update

78455aa

Merge branch 'main' into schema-update

8f94090

hsingh574 reviewed May 28, 2025

View reviewed changes

feat: Add basic operations for UpdateSchema #1172

Are you sure you want to change the base?

feat: Add basic operations for UpdateSchema #1172

Uh oh!

Conversation

jonathanc-n commented Apr 7, 2025

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

jonathanc-n commented May 7, 2025

Uh oh!

CTTY left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsingh574 commented May 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmcarthur commented Jun 24, 2025

Uh oh!

Uh oh!

feat: Add basic operations for `UpdateSchema` #1172

feat: Add basic operations for `UpdateSchema` #1172