Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly use xsi:nil to deserialize null values via serde #842

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

weiznich
Copy link

@weiznich weiznich commented Jan 27, 2025

This commit fixes an issue that causes quick-xml trying to deserialize empty tags via the serde interface even if these tags were explicitly marked as xsi:nil="true"

For example the following XML failed to deserialize before this commit:

<bar>
    <foo xsi:nil="true"/>
</bar>

into the following rust type:

#[derive(Deserialize)]
struct Bar {
    foo: Option<Inner>,
}

#[derive(Deserialize)]
struct Foo {
    baz: String,
}

Before this commit this failed to deserialize with an error message that complained that the baz field was missing. After this commit this uses the xsi:nil attribute to deserialize this into foo: None instead. The standard (https://www.w3.org/TR/xmlschema-1/#xsi_nil) seems to support this behaviour.

Fix #497

@Mingun Mingun added the serde Issues related to mapping from Rust types to XML label Jan 27, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jan 27, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 77.63158% with 17 lines in your changes missing coverage. Please review.

Project coverage is 60.73%. Comparing base (a9391f3) to head (ad89a00).
Report is 26 commits behind head on master.

Files with missing lines Patch % Lines
src/de/mod.rs 72.54% 14 Missing ⚠️
src/events/mod.rs 82.35% 3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #842      +/-   ##
==========================================
+ Coverage   60.21%   60.73%   +0.51%     
==========================================
  Files          41       41              
  Lines       16021    16013       -8     
==========================================
+ Hits         9647     9725      +78     
+ Misses       6374     6288      -86     
Flag Coverage Δ
unittests 60.73% <77.63%> (+0.51%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, and it's nice to see such a person in committers. I think, that we should try to improve it to cover all possible scenarios, but if that would be too many work (because maybe we need to switch to namespace-aware parser), the current implementation could be merged as the first step.

In any case, we need a changelog entry, and in case of using only xsi prefix this should be explicitly noted in the documentation somewhere here.

If you're not prepared to dive so deep to explore the things required, I can finish this PR myself, but that can take some time.

Comment on lines 237 to 244

pub(crate) fn is_xsi_nil(&self) -> bool {
self.clone().any(|attr| {
matches!(attr, Ok(crate::events::attributes::Attribute{
key: QName(b"xsi:nil"), value
}) if &*value == b"true")
})
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the xsi prefix usually bound to http://www.w3.org/2001/XMLSchema-instance namespace, this is not necessary true. I would like to resolve actual namespace and check against it.

Also, need to check is the true the only thing that represents truth value.

I think, it would be better to place such method to the BytesStart, because then we can avoid cloning and the name would better fit its purpose (currently it actually checks has_xsi_nil).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the xsi prefix usually bound to http://www.w3.org/2001/XMLSchema-instance namespace, this is not necessary true. I would like to resolve actual namespace and check against it.

I will try to have a look at that, if the idea I have works it shouldn't be too hard. Just to verify that I have the right understanding how this namespace thing works:

  • I need to collect all xmlns:xyz attributes from any parent tag
  • I then need to check if one or more of these attributes is set to http://www.w3.org/2001/XMLSchema-instance
  • That then will me give the namespace name, so xyz in the above examle
  • Finally I can use that name to check if an nil attribute exists and then use that in the check

Also, need to check is the true the only thing that represents truth value.

Unless I miss something this already checks for the true value via a guard expression (line 242)

I think, it would be better to place such method to the BytesStart, because then we can avoid cloning and the name would better fit its purpose (currently it actually checks has_xsi_nil).

I'm not sure I can follow here. Attributes just borrows the buffer from BytesStart so no cloning should be happening here.

Copy link
Collaborator

@Mingun Mingun Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I need to collect all xmlns:xyz attributes from any parent tag

Basically we need to use NsReader instead of a Reader in Deserializer, so here:

quick-xml/src/de/mod.rs

Lines 3080 to 3084 in 8f91a9c

pub struct IoReader<R: BufRead> {
reader: Reader<R>,
start_trimmer: StartTrimmer,
buf: Vec<u8>,
}

...and here:

quick-xml/src/de/mod.rs

Lines 3149 to 3152 in 8f91a9c

pub struct SliceReader<'de> {
reader: Reader<&'de [u8]>,
start_trimmer: StartTrimmer,
}

Then make any necessary changes to make code compilable and then you may use resolve methods to resolve prefix. And it seems to me that that refactoring may be not so trivial.

Unless I miss something this already checks for the true value via a guard expression (line 242)

I mean, maybe not only the string "true" represents the true value. For example, XML Schema defines "true" and "1" as true and "false" and "0" as false. It seems that only "true" should be used, as stated here, but I did not check all occurrences of the nil word in the specification. The XML spec is a hard thing to read, so we must be carefull. I would like also have link to the specification (maybe the mentioned link is the right link) near to the check so everyone could verify that check is correct without much efforts.

I'm not sure I can follow here. Attributes just borrows the buffer from BytesStart so no cloning should be happening here.

Yes, I think, actually no cloning is made here but that not obvious from the code. You need to look at the Attributes definition to be sure. As this method used only in start.attributes().is_xsi_nil() chain, it makes sence to move it to the BytesStart for clarity and call it as start.is_xsi_nil().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then make any necessary changes to make code compilable and then you may use resolve methods to resolve prefix. And it seems to me that that refactoring may be not so trivial.

Maybe I miss something obvious but that wasn't complicated. In fact it literally required changing these two types + adjusting my new added code to get access to the resolve methods.

I mean, maybe not only the string "true" represents the true value. For example, XML Schema defines "true" and "1" as true and "false" and "0" as false. It seems that only "true" should be used, as stated here, but I did not check all occurrences of the nil word in the specification. The XML spec is a hard thing to read, so we must be carefull. I would like also have link to the specification (maybe the mentioned link is the right link) near to the check so everyone could verify that check is correct without much efforts.

Done

Yes, I think, actually no cloning is made here but that not obvious from the code. You need to look at the Attributes definition to be sure. As this method used only in start.attributes().is_xsi_nil() chain, it makes sence to move it to the BytesStart for clarity and call it as start.is_xsi_nil().

Done

src/de/mod.rs Outdated
@@ -4680,4 +4714,43 @@ mod tests {
}
}
}

#[test]
fn deserialize_nil() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to group tests using modules. This have some advantages, for example, code folding in the editor. In that case it seems reasonable to split this test to several in the nil module:

mod nil {
  // <foo xsi:nil="true"/>
  // <foo xsi:nil="true" attr="value"/>
  // <foo xsi:nil="true"><tag/></foo>
  fn true_() {}

  // <foo xsi:nil="false"/>
  // <foo xsi:nil="false" attr="value"/>
  // <foo xsi:nil="false"><tag/></foo>
  fn false_() {}
}
// and the same for the case of nested element with nil

Those tests are better to place to the serde-de tests to reduce overall time of compiling library (which already not so good), because they do not use internal details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the tests + added some more variants. What's still not tested is using a different namespace prefix, but I can also add tests for that if you want.

src/de/mod.rs Outdated
let data = r#"<foo xsi:nil="true"/>"#;

let res = super::from_str::<Option<Foo>>(data).unwrap();
assert!(res.is_none());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use assert_eq checks where possible because it will output diff in case of failed check (project uses pretty_assertions crate for diffs).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

This commit fixes an issue that causes `quick-xml` trying to deserialize
empty tags via the serde interface even if these tags were explicitly
marked as `xsi:nil="true"`

For example the following XML failed to deserialize before this commit:

```xml
<bar>
	<foo xsi:nil="true"/>
</bar>
```

into the following rust type:

```rust
#[derive(Deserialize)]
struct Bar {
    foo: Option<Inner>,
}

#[derive(Deserialize)]
struct Foo {
    baz: String,
}
```

Before this commit this failed to deserialize with an error message that
complained that the `baz` field was missing. After this commit this uses
the `xsi:nil` attribute to deserialize this into `foo: None` instead.
The standard (https://www.w3.org/TR/xmlschema-1/#xsi_nil) seems to
support this behaviour. 

Fix tafia#497
@Mingun
Copy link
Collaborator

Mingun commented Feb 2, 2025

Just want to inform you, that this PR is not abandoned, I working on improving testcases to ensure that everything is consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option<bool> doesn't (de)serialize properly with serde
3 participants