Skip to content

URL.host returns Punycode instead of Unicode for some URLs #3332

@loic-bellinger

Description

@loic-bellinger

Description

The URL.host property does not decode IDNA hostnames into Unicode, which contradicts the specification. According to the httpx documentation, the host should always be returned as a string, normalized to lowercase, with IDNA hosts decoded into Unicode.

Step to reproduce

from urllib.parse import urlparse
from httpx import URL

test_url = "https://www.égalité-femmes-hommes.gouv.fr"
print(URL(test_url).host)  # Expected: "www.égalité-femmes-hommes.gouv.fr", but returns: "www.xn--galit-femmes-hommes-9ybf.gouv.fr"
print(urlparse(test_url).hostname)  # returns: "www.égalité-femmes-hommes.gouv.fr". idna.decode() also returns this.

Expected behavior

The URL.host property should return the Unicode version of the host, in this case: www.égalité-femmes-hommes.gouv.fr.

Actual behavior

The URL.host property returns the Punycode-encoded version of the host: www.xn--galit-femmes-hommes-9ybf.gouv.fr.

Potential fix
It seems the issue arises in this part of the httpx code:

@property
def host(self) -> str: 
    host: str = self._uri_reference.host

    if host.startswith("xn--"):
        host = idna.decode(host)

    return host

The use of startswith("xn--") checks only for Punycode-encoded hosts that begin with this prefix. However, it should handle cases where IDNA encoding is used more comprehensively.

Replacing host.startswith("xn--") with something like if "xn--" in host might handle a broader set of cases?

Environment

httpx version: 0.27.2
Python version: 3.12.x
OS: Linux/Windows

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions