encoding autodetection failed with cyrillic characters #2446
Answered
by
lovelydinosaur
gremur
asked this question in
Potential Issue
-
How to reproduce the case: returns |
Beta Was this translation helpful? Give feedback.
Answered by
lovelydinosaur
Nov 18, 2022
Replies: 1 comment 1 reply
-
|
You've got an incorrect decoding because the webserver is misconfigured... $ httpx https://www.tscollection.ru
HTTP/1.1 200 OK
Server: openresty
Date: Fri, 18 Nov 2022 10:58:30 GMT
Content-Type: text/html; charset=DEFAULT_CHARSET # <--- Ooops
Transfer-Encoding: chunked
Connection: keep-alive
Last-Modified: Fri, 18 Nov 2022 10:58:30 GMT
Set-Cookie: PHPSESSID=c38j4eb0fj985usa3ec89dugj0; expires=Fri, 18-Nov-2022 11:58:30 GMT; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
...Our documentation for how to fix this by enabling charset auto-detection as a fallback for cases like this is here: https://www.python-httpx.org/advanced/#character-set-encodings-and-auto-detection For example, the following works for the site you've provided: import httpx
import charset_normalizer
def autodetect(content):
return charset_normalizer.detect(content).get("encoding")
# Using a client with character-set autodetection enabled.
client = httpx.Client(default_encoding=autodetect)
response = client.get("https://www.tscollection.ru")
print(response.encoding) # This will either print the charset given in
# the Content-Type charset, or else the auto-detected
# character set. |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
lovelydinosaur
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You've got an incorrect decoding because the webserver is misconfigured...
Our documentation for how to fix this by enabling charset auto-detection as a …