-
-
Notifications
You must be signed in to change notification settings - Fork 999
Description
Description
The URL.host property does not decode IDNA hostnames into Unicode, which contradicts the specification. According to the httpx documentation, the host should always be returned as a string, normalized to lowercase, with IDNA hosts decoded into Unicode.
Step to reproduce
from urllib.parse import urlparse
from httpx import URL
test_url = "https://www.égalité-femmes-hommes.gouv.fr"
print(URL(test_url).host) # Expected: "www.égalité-femmes-hommes.gouv.fr", but returns: "www.xn--galit-femmes-hommes-9ybf.gouv.fr"
print(urlparse(test_url).hostname) # returns: "www.égalité-femmes-hommes.gouv.fr". idna.decode() also returns this.Expected behavior
The URL.host property should return the Unicode version of the host, in this case: www.égalité-femmes-hommes.gouv.fr.
Actual behavior
The URL.host property returns the Punycode-encoded version of the host: www.xn--galit-femmes-hommes-9ybf.gouv.fr.
Potential fix
It seems the issue arises in this part of the httpx code:
@property
def host(self) -> str:
host: str = self._uri_reference.host
if host.startswith("xn--"):
host = idna.decode(host)
return hostThe use of startswith("xn--") checks only for Punycode-encoded hosts that begin with this prefix. However, it should handle cases where IDNA encoding is used more comprehensively.
Replacing host.startswith("xn--") with something like if "xn--" in host might handle a broader set of cases?
Environment
httpx version: 0.27.2
Python version: 3.12.x
OS: Linux/Windows