Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XStream fails to correctly parse the ampersand after the character from the Unicode Supplementary Private Use Area-B #368

Open
aliaksei-burlakou opened this issue Jan 13, 2025 · 2 comments
Assignees
Labels

Comments

@aliaksei-burlakou
Copy link

aliaksei-burlakou commented Jan 13, 2025

Expected Behavior

XML document with the encoded Unicode characters from the Unicode Supplementary Private Use Area-B (like 􏰍) should be deserialized by XStream without any issues with these characters or any other valid characters, regardless of their location in the document.

Actual Behavior

XStream erroneously appends a replacement character (�) after the ampersand during deserializing if the XML document contains a character from the Unicode Supplementary Private Use Area-B somewhere before the ampersand in the XML.

Steps to reproduce

  • Java 17 (Amazon Corretto JDK, build 17.0.11+9-LTS), XStream v1.4.21
  • The 􏰍 encoded character (􏰍, U+10FC0D, HEX: F4 8F B0 8D) should present somewhere in the XML document before the encoded ampersand (&).

Simple code example:

  • RootTag class:
@XStreamAlias("rootTag")
public class RootTag {
    @XStreamAlias("text")
    private TextTag text;

    public TextTag getText() {
        return text;
    }
}
  • TextTag class:
@XStreamConverter(value = ToAttributedValueConverter.class, strings = {"value"})
@XStreamAlias("textTag")
public class TextTag {
    private String value;

    public String getValue() {
        return value;
    }
}

  • Test class with the simple XML input:
class XStreamTest {

    @Test
    void testXStreamFailsToParseAmpersandAfterSupplementaryCharacter() throws Exception {
        String input = """
                <?xml version="1.0" encoding="UTF-8"?>
                <rootTag>
                    <text>Test: &amp; ampersand before, supplementary character &#1113101;, ampersand &amp; after</text>
                </rootTag>""";

        XStream xStream = new XStream();
        xStream.processAnnotations(RootTag.class);
        xStream.addPermission(new ExplicitTypePermission(new Class[]{RootTag.class}));

        try (InputStream is = new ByteArrayInputStream(input.getBytes(StandardCharsets.UTF_8))) {
            RootTag rootTag = (RootTag) xStream.fromXML(is);
            assertEquals("Test: & ampersand before, supplementary symbol \uDBFF\uDC0D, ampersand & after",
                    rootTag.getText().getValue());
        }
    }
}
  • Output:
Expected :Test: & ampersand before, supplementary character 􏰍, ampersand & after
Actual   :Test: & ampersand before, supplementary character 􏰍, ampersand &� after

NOTE:

This issue may be related to the #336 (PrettyPrintWriter cannot write emoji in XML 1.1 mode).

@joehni joehni self-assigned this Jan 13, 2025
@joehni joehni added the invalid label Jan 13, 2025
@joehni
Copy link
Member

joehni commented Jan 13, 2025

XStream does not actually parse XML at all, but uses an XML parser instead. You can select the parser on your own by setting the appropriate driver. Please open an issue for the MXParser, which is used in your example as the default.

@aliaksei-burlakou
Copy link
Author

aliaksei-burlakou commented Jan 14, 2025

Thank you @joehni! I opened issue for the MXParser:
x-stream/mxparser#7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants