-
-
Notifications
You must be signed in to change notification settings - Fork 34.7k
Processing non-ascii tags and attributes in HTMLParser #141756
Copy link
Copy link
Open
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15pre-release feature fixes, bugs and security fixespre-release feature fixes, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Metadata
Metadata
Assignees
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15pre-release feature fixes, bugs and security fixespre-release feature fixes, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Fields
Give feedbackNo fields configured for issues without a type.
Projects
StatusShow more project fields
Todo
Bug report
html.parser.HTMLParserconvert names of tags and attributes to lower case. But the HTML5 specification only prescripts converting ASCII upper alpha characters to lower case.There are some non-ASCII characters which are converted to ASCII lowercase characters (e.g. "ß" -> "ss", "K" (U+212A) -> "k", "ſ" -> "s"). They will be parsed differently by
HTMLParserand any other parser or browser.Linked PRs