Open
Description
I have the following input HTML file:
<html><body><div><a hr</div><div><div></div>
<div><a href="/">bar</a></div></div></body></html>
Notice the unclosed <a
tag (this is a minimal repro, in my case it's coming from an accidentally truncated DB value).
If I open it in a browser (Firefox/Chrome) and print its DOM with document.getElementsByTagName("html")[0].outerHTML
, I get:
<html><head></head><body>
<div id="div0">
<a hr="" <="" div="">
</a><div id="div1"><a hr="" <="" div="">
<div id="div2"></div>
</a><div id="div3"><a hr="" <="" div="">
</a><a href="/">bar</a>
</div>
</div>
</body></html>
With scraper
, if I parse it with Html::parse_document
and print it with doc.root_element().html()
, I get:
<html><head></head><body><div><a hr<="" div=""></a><div><a hr<="" div=""><div></div>
</div>
</div></body></html>
Notice that the anchor tag with text bar
is missing!
Running this input with html5ever
's example sinks, I get an input close to browsers (but still not the same, see servo/html5ever#512).
It seems to indicate that there's an issue with scraper's TreeSink
implementation.