XSS in html.parser library

# Description

The library `html.parser` allows an attacker to bypass any whitelist of HTML tags and attributes that seek to mitigate XSS. This is possible because the application does not correctly parse the HTML comments in the user input.

# Vulnerability

This vulnerability occurs because the application does not correctly parse the HTML comments in the user input.

# Exploitation

In this scenario a developer parses the HTML entered by the user to validate it with an allowlist of tags and attributes. This is to prevent XSS attacks. In this case we see how we can bypass a security check of this type, thanks to the fact that the parser does not parse the HTML comments properly.

## poc.py

```python
from html.parser import HTMLParser
from html.entities import name2codepoint

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print("Start tag:", tag)
        # Whitelist Tags
        print("Invalid tag:",tag != "h1")
        for attr in attrs:
            # Whitelist Attr
            print("attr:", attr)
            print("Invalid attr:",attr != "alt")

    def handle_endtag(self, tag):
        print("End tag  :", tag)

    def handle_data(self, data):
        print("Data     :", data)

    def handle_comment(self, data):
        print("Comment  :", data)

    def handle_entityref(self, name):
        c = chr(name2codepoint[name])
        print("Named ent:", c)

    def handle_charref(self, name):
        if name.startswith('x'):
            c = chr(int(name[1:], 16))
        else:
            c = chr(int(name))
        print("Num ent  :", c)

    def handle_decl(self, data):
        print("Decl     :", data)

parser = MyHTMLParser()
parser.feed('
### Linked PRs
* gh-117406
* gh-135664

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

XSS in html.parser library #102555

Description

Vulnerability

Exploitation

poc.py

Evidence of exploitation

Expected behavior

System Information

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

XSS in html.parser library #102555

Description

Description

Vulnerability

Exploitation

poc.py

Evidence of exploitation

Expected behavior

System Information

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions