-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly use xsi:nil
to deserialize null values via serde
#842
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #842 +/- ##
==========================================
+ Coverage 60.21% 60.73% +0.51%
==========================================
Files 41 41
Lines 16021 16013 -8
==========================================
+ Hits 9647 9725 +78
+ Misses 6374 6288 -86
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work, and it's nice to see such a person in committers. I think, that we should try to improve it to cover all possible scenarios, but if that would be too many work (because maybe we need to switch to namespace-aware parser), the current implementation could be merged as the first step.
In any case, we need a changelog entry, and in case of using only xsi
prefix this should be explicitly noted in the documentation somewhere here.
If you're not prepared to dive so deep to explore the things required, I can finish this PR myself, but that can take some time.
src/events/attributes.rs
Outdated
|
||
pub(crate) fn is_xsi_nil(&self) -> bool { | ||
self.clone().any(|attr| { | ||
matches!(attr, Ok(crate::events::attributes::Attribute{ | ||
key: QName(b"xsi:nil"), value | ||
}) if &*value == b"true") | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the xsi
prefix usually bound to http://www.w3.org/2001/XMLSchema-instance
namespace, this is not necessary true. I would like to resolve actual namespace and check against it.
Also, need to check is the true
the only thing that represents truth value.
I think, it would be better to place such method to the BytesStart
, because then we can avoid cloning and the name would better fit its purpose (currently it actually checks has_xsi_nil
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the xsi prefix usually bound to http://www.w3.org/2001/XMLSchema-instance namespace, this is not necessary true. I would like to resolve actual namespace and check against it.
I will try to have a look at that, if the idea I have works it shouldn't be too hard. Just to verify that I have the right understanding how this namespace thing works:
- I need to collect all
xmlns:xyz
attributes from any parent tag - I then need to check if one or more of these attributes is set to
http://www.w3.org/2001/XMLSchema-instance
- That then will me give the namespace name, so
xyz
in the above examle - Finally I can use that name to check if an
nil
attribute exists and then use that in the check
Also, need to check is the true the only thing that represents truth value.
Unless I miss something this already checks for the true
value via a guard expression (line 242)
I think, it would be better to place such method to the BytesStart, because then we can avoid cloning and the name would better fit its purpose (currently it actually checks has_xsi_nil).
I'm not sure I can follow here. Attributes
just borrows the buffer from BytesStart
so no cloning should be happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I need to collect all
xmlns:xyz
attributes from any parent tag
Basically we need to use NsReader
instead of a Reader
in Deserializer
, so here:
Lines 3080 to 3084 in 8f91a9c
pub struct IoReader<R: BufRead> { | |
reader: Reader<R>, | |
start_trimmer: StartTrimmer, | |
buf: Vec<u8>, | |
} |
...and here:
Lines 3149 to 3152 in 8f91a9c
pub struct SliceReader<'de> { | |
reader: Reader<&'de [u8]>, | |
start_trimmer: StartTrimmer, | |
} |
Then make any necessary changes to make code compilable and then you may use resolve
methods to resolve prefix. And it seems to me that that refactoring may be not so trivial.
Unless I miss something this already checks for the true value via a guard expression (line 242)
I mean, maybe not only the string "true"
represents the true
value. For example, XML Schema defines "true"
and "1"
as true
and "false"
and "0"
as false
. It seems that only "true"
should be used, as stated here, but I did not check all occurrences of the nil
word in the specification. The XML spec is a hard thing to read, so we must be carefull. I would like also have link to the specification (maybe the mentioned link is the right link) near to the check so everyone could verify that check is correct without much efforts.
I'm not sure I can follow here.
Attributes
just borrows the buffer fromBytesStart
so no cloning should be happening here.
Yes, I think, actually no cloning is made here but that not obvious from the code. You need to look at the Attributes
definition to be sure. As this method used only in start.attributes().is_xsi_nil()
chain, it makes sence to move it to the BytesStart
for clarity and call it as start.is_xsi_nil()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then make any necessary changes to make code compilable and then you may use resolve methods to resolve prefix. And it seems to me that that refactoring may be not so trivial.
Maybe I miss something obvious but that wasn't complicated. In fact it literally required changing these two types + adjusting my new added code to get access to the resolve methods.
I mean, maybe not only the string "true" represents the true value. For example, XML Schema defines "true" and "1" as true and "false" and "0" as false. It seems that only "true" should be used, as stated here, but I did not check all occurrences of the nil word in the specification. The XML spec is a hard thing to read, so we must be carefull. I would like also have link to the specification (maybe the mentioned link is the right link) near to the check so everyone could verify that check is correct without much efforts.
Done
Yes, I think, actually no cloning is made here but that not obvious from the code. You need to look at the Attributes definition to be sure. As this method used only in start.attributes().is_xsi_nil() chain, it makes sence to move it to the BytesStart for clarity and call it as start.is_xsi_nil().
Done
src/de/mod.rs
Outdated
@@ -4680,4 +4714,43 @@ mod tests { | |||
} | |||
} | |||
} | |||
|
|||
#[test] | |||
fn deserialize_nil() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to group tests using modules. This have some advantages, for example, code folding in the editor. In that case it seems reasonable to split this test to several in the nil
module:
mod nil {
// <foo xsi:nil="true"/>
// <foo xsi:nil="true" attr="value"/>
// <foo xsi:nil="true"><tag/></foo>
fn true_() {}
// <foo xsi:nil="false"/>
// <foo xsi:nil="false" attr="value"/>
// <foo xsi:nil="false"><tag/></foo>
fn false_() {}
}
// and the same for the case of nested element with nil
Those tests are better to place to the serde-de tests to reduce overall time of compiling library (which already not so good), because they do not use internal details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the tests + added some more variants. What's still not tested is using a different namespace prefix, but I can also add tests for that if you want.
src/de/mod.rs
Outdated
let data = r#"<foo xsi:nil="true"/>"#; | ||
|
||
let res = super::from_str::<Option<Foo>>(data).unwrap(); | ||
assert!(res.is_none()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use assert_eq
checks where possible because it will output diff in case of failed check (project uses pretty_assertions
crate for diffs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
8df645b
to
fa3df2a
Compare
This commit fixes an issue that causes `quick-xml` trying to deserialize empty tags via the serde interface even if these tags were explicitly marked as `xsi:nil="true"` For example the following XML failed to deserialize before this commit: ```xml <bar> <foo xsi:nil="true"/> </bar> ``` into the following rust type: ```rust #[derive(Deserialize)] struct Bar { foo: Option<Inner>, } #[derive(Deserialize)] struct Foo { baz: String, } ``` Before this commit this failed to deserialize with an error message that complained that the `baz` field was missing. After this commit this uses the `xsi:nil` attribute to deserialize this into `foo: None` instead. The standard (https://www.w3.org/TR/xmlschema-1/#xsi_nil) seems to support this behaviour. Fix tafia#497
fa3df2a
to
ad89a00
Compare
Just want to inform you, that this PR is not abandoned, I working on improving testcases to ensure that everything is consistent. |
This commit fixes an issue that causes
quick-xml
trying to deserialize empty tags via the serde interface even if these tags were explicitly marked asxsi:nil="true"
For example the following XML failed to deserialize before this commit:
into the following rust type:
Before this commit this failed to deserialize with an error message that complained that the
baz
field was missing. After this commit this uses thexsi:nil
attribute to deserialize this intofoo: None
instead. The standard (https://www.w3.org/TR/xmlschema-1/#xsi_nil) seems to support this behaviour.Fix #497