Skip to content

Commit 593fab4

Browse files
Conform encoding-label matching to Encoding spec
This change makes the parser’s encoding-name matching conform to the current Encoding spec at https://encoding.spec.whatwg.org/#concept-encoding-get — which requires that only leading and trailing whitespace be removed from a string before checking if it matches any valid encoding names. Otherwise, without this change, the parser instead implements https://www.unicode.org/reports/tr22/tr22-8.html#Charset_Alias_Matching — which requires deleting “all characters except a-z, A-Z, and 0-9” from a string before checking if it matches any valid encoding names. That difference makes causes us fail a number of html5-tests cases.
1 parent 3f48926 commit 593fab4

File tree

1 file changed

+1
-3
lines changed

1 file changed

+1
-3
lines changed

src/nu/validator/htmlparser/io/Encoding.java

+1-3
Original file line numberDiff line numberDiff line change
@@ -254,9 +254,7 @@ public static String toNameKey(String str) {
254254
if (c >= 'A' && c <= 'Z') {
255255
c += 0x20;
256256
}
257-
if (!((c >= '\t' && c <= '\r') || (c >= '\u0020' && c <= '\u002F')
258-
|| (c >= '\u003A' && c <= '\u0040')
259-
|| (c >= '\u005B' && c <= '\u0060') || (c >= '\u007B' && c <= '\u007E'))) {
257+
if (!Arrays.asList('\t','\n','\f','\r','\u0020').contains(c)) {
260258
buf[j] = c;
261259
j++;
262260
}

0 commit comments

Comments
 (0)