Skip to content

ext/xml case folding broken for non-ASCII characters #12341

@ndossche

Description

@ndossche
Member

Description

See ext/xml/tests/xml007.phpt

The code will never run in 8.2 or up because of changes to how strtoupper works. This means the test will be skipped due to the SKIPIF settings which has a locale-dependent strtoupper check.
Removing the SKIPIF shows that the test is indeed broken, even on systems where it worked on 8.1 it won't work on 8.2.

Other than that, this never worked properly because it depends on the locale.

From https://www.w3.org/TR/WD-xml-970807.xml

case-folding: a process applied to a sequence of characters, in which those identified as non-uppercase (in scripts which have case distinctions) are replaced by their uppercase equivalents, as specified in The Unicode Standard, Version 2.0, section 4.1. Note that Unicode recommends folding to lowercase; for compatibility reasons, XML processors must fold to uppercase. Case-folding, as described here, neither requires nor forbids the normalization of Unicode character sequences into canonical form (e.g. as described in The Unicode Standard, section 5.9).

So it's not supposed to depend on the locale.

PHP Version

8.1 / 8.2 depending on your point of view

Operating System

No response

Activity

ndossche

ndossche commented on Oct 7, 2023

@ndossche
MemberAuthor

If mbstring is available, it's trivial to fix because mbstring exposes a public api method to case convert.
But mbstring, although probably present on most systems, isn't even enabled by default...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @ndossche

        Issue actions

          ext/xml case folding broken for non-ASCII characters · Issue #12341 · php/php-src