-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
Leading whitespace treatment in xmlhtml
is inconsistent.
Consider the following behavior of parseXML
(the same happens with parseHTML
) where leading whitespace is always dropped, and trailing whitespace is always kept:
> parseXML "x" ""
Right (XmlDocument {docEncoding = UTF8, docType = Nothing, docContent = []})
> parseXML "x" " "
Right (XmlDocument {docEncoding = UTF8, docType = Nothing, docContent = []})
> parseXML "x" "x"
Right (XmlDocument {docEncoding = UTF8, docType = Nothing, docContent = [TextNode "x"]})
> parseXML "x" " x"
Right (XmlDocument {docEncoding = UTF8, docType = Nothing, docContent = [TextNode "x"]})
> parseXML "x" "x "
Right (XmlDocument {docEncoding = UTF8, docType = Nothing, docContent = [TextNode "x "]})
> parseXML "x" " x "
Right (XmlDocument {docEncoding = UTF8, docType = Nothing, docContent = [TextNode "x "]})
See what happens, however, when the “leading whitespace” comes after some element:
> parseXML "x" "<a/> b "
Right (HtmlDocument {docEncoding = UTF8, docType = Nothing, docContent = [Element {elementTag = "a", elementAttrs = [], elementChildren = []},TextNode " b "]})
These two examples behave differently, and I think the correct behavior is the one from the latter example, since xmlhtml
should not be discarding the contents of a text node.
So, my proposal is:
-
Keep the behavior of leading whitespace after an element as it is today.
-
Keep the behavior of trailing whitespace everywhere as it is today.
-
Fix top-level text node parsing so that it doesn't discard leading whitespace.
Metadata
Metadata
Assignees
Labels
No labels