Making charset auto-detection strictly opt-in. #2083
Unanswered
lovelydinosaur
asked this question in
Ideas
Replies: 1 comment
-
|
Looks like this recently landed (in May 2022): |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
So, I've been thinking for a while about making our charset auto-detection strictly opt-in.
Our charset auto-detection is used for cases where
response.textis accessed, but nocharsetis present in the responseContent-Type.Right now we'll fallback to using
charset_normalizerin that case in order to auto-detect an encoding. Which is a bit of a mixed bag. It's a bit of a fuzzy approach, and I'm not overly keen on it. We've triedutf-8only as the fallback in the past, whichSo, here's an alternative.
Rather than having an
apparent_encodingon theRequestclass, which is used as the fallback, I'd suggest the following for charset control...charset_fallback="utf-8"# Default to utf-8 as the fallback.charset_errors="replace"# Default to the lenient "replace" for decoding failures.We'd have those on the
Request()model, and on theClient()model, so for instance...Or...
However we would still like to support auto-detection for the fallback, but make it strictly opt-in.
We'd do that by having
charset_normalizeras a regular installable codec...Which then allows...
Or...
Related: #1018, #1269, #1657, #1791
Beta Was this translation helpful? Give feedback.
All reactions