Skip to content

gh-149489: Fix ElementTree serialization to HTML#149490

Merged
serhiy-storchaka merged 6 commits into
python:mainfrom
serhiy-storchaka:xml-etree-serialize-html
May 29, 2026
Merged

gh-149489: Fix ElementTree serialization to HTML#149490
serhiy-storchaka merged 6 commits into
python:mainfrom
serhiy-storchaka:xml-etree-serialize-html

Conversation

@serhiy-storchaka
Copy link
Copy Markdown
Member

@serhiy-storchaka serhiy-storchaka commented May 7, 2026

  • The content of comments, processing instructions and elements "xmp", "iframe", "noembed", "noframes", and "plaintext" is no longer escaped.
  • The "plaintext" element no longer have the closing tag.
  • Add support of empty attributes (with value None).

* The content of comments, processing instructions and elements "xmp",
  "iframe", "noembed", "noframes", and "plaintext" is no longer escaped.
* The "plaintext" element no longer have the closing tag.
* Add support of empty attributes (with value None).
@StanFromIreland
Copy link
Copy Markdown
Member

Updating to fix some errors we introduced on the main branch.

@StanFromIreland StanFromIreland added the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label May 7, 2026
@scoder
Copy link
Copy Markdown
Contributor

scoder commented May 8, 2026

I'm not done with the review yet, but I find it risky to silently change output in a point release like 3.1[34].x. If we change this, it's probably still fine for 3.15, but I'd rather see the maintenance releases excluded.

@serhiy-storchaka
Copy link
Copy Markdown
Member Author

Well, the fix for XML was applied to 2.7 and 3.2 and was not backported to 2.6 and 3.1. It introduces a risk of XML/HTML injection if the comment content was not previously sanitized. See #149468.

@scoder
Copy link
Copy Markdown
Contributor

scoder commented May 8, 2026

The content of comments, processing instructions and elements "xmp", "iframe", "noembed", "noframes", and "plaintext" is no longer escaped.
[...] It introduces a risk of XML/HTML injection if the comment content was not previously sanitized.

Well, yes. If we remove the current escaping, then we leave user code unprotected. Definitely not something that users should expect from a point release.

@serhiy-storchaka
Copy link
Copy Markdown
Member Author

Test failures are related to #149571.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It would be good to add some comments to clearly state the intent of the tests (e.g. comments shouldn't be excaped, comments are omitted in text serialization, empty attrs only work in html, etc.).
  2. Is there any reason why the there are 3 similar but different test strings ('<spam> & ham', 'ham & eggs < spam', '<spam>&ham')?

Comment thread Lib/xml/etree/ElementTree.py Outdated
serhiy-storchaka and others added 2 commits May 29, 2026 09:14
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@serhiy-storchaka serhiy-storchaka merged commit bcd29e4 into python:main May 29, 2026
59 checks passed
@miss-islington-app
Copy link
Copy Markdown

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14, 3.15.
🐍🍒⛏🤖

@serhiy-storchaka serhiy-storchaka deleted the xml-etree-serialize-html branch May 29, 2026 21:04
@serhiy-storchaka serhiy-storchaka removed needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels May 29, 2026
@serhiy-storchaka serhiy-storchaka removed the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label May 29, 2026
@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 29, 2026

GH-150592 is a backport of this pull request to the 3.15 branch.

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 29, 2026

GH-150593 is a backport of this pull request to the 3.14 branch.

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 29, 2026

GH-150594 is a backport of this pull request to the 3.13 branch.

@serhiy-storchaka serhiy-storchaka added the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label May 29, 2026
@miss-islington-app
Copy link
Copy Markdown

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.15.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 29, 2026
* The content of comments, processing instructions and elements "xmp",
  "iframe", "noembed", "noframes", and "plaintext" is no longer escaped.
* The "plaintext" element no longer have the closing tag.
* Add support of empty attributes (with value None).
(cherry picked from commit bcd29e4)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@miss-islington-app
Copy link
Copy Markdown

Sorry @serhiy-storchaka, I had trouble completing the backport.
Please retry by removing and re-adding the "needs backport to 3.15" label.
Please backport backport using cherry_picker on the command line.

cherry_picker bcd29e466f55d8b4e3849ed6ada8ce86a46f5072 3.15

@serhiy-storchaka serhiy-storchaka added needs backport to 3.15 pre-release feature fixes, bugs and security fixes and removed needs backport to 3.15 pre-release feature fixes, bugs and security fixes labels May 29, 2026
@miss-islington-app
Copy link
Copy Markdown

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.15.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 29, 2026
* The content of comments, processing instructions and elements "xmp",
  "iframe", "noembed", "noframes", and "plaintext" is no longer escaped.
* The "plaintext" element no longer have the closing tag.
* Add support of empty attributes (with value None).
(cherry picked from commit bcd29e4)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@miss-islington-app
Copy link
Copy Markdown

Sorry @serhiy-storchaka, I had trouble completing the backport.
Please retry by removing and re-adding the "needs backport to 3.15" label.
Please backport backport using cherry_picker on the command line.

cherry_picker bcd29e466f55d8b4e3849ed6ada8ce86a46f5072 3.15

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 29, 2026

GH-150595 is a backport of this pull request to the 3.15 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label May 29, 2026
serhiy-storchaka added a commit that referenced this pull request May 29, 2026
…H-150595)

* The content of comments, processing instructions and elements "xmp",
  "iframe", "noembed", "noframes", and "plaintext" is no longer escaped.
* The "plaintext" element no longer have the closing tag.
* Add support of empty attributes (with value None).
(cherry picked from commit bcd29e4)
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request May 29, 2026
…GH-149490) (pythonGH-150595)

* The content of elements "xmp", "iframe", "noembed", "noframes",
  and "plaintext" is no longer escaped.
* The "plaintext" element no longer have the closing tag.

(cherry picked from commit bcd29e4)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request May 29, 2026
…GH-149490)

* The content of elements "xmp", "iframe", "noembed", "noframes",
  and "plaintext" is no longer escaped.
* The "plaintext" element no longer have the closing tag.

(cherry picked from commit bcd29e4)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 29, 2026

GH-150596 is a backport of this pull request to the 3.14 branch.

serhiy-storchaka added a commit that referenced this pull request May 30, 2026
…H-150596)

* The content of elements "xmp", "iframe", "noembed", "noframes",
  and "plaintext" is no longer escaped.
* The "plaintext" element no longer have the closing tag.

(cherry picked from commit bcd29e4)
serhiy-storchaka added a commit that referenced this pull request May 30, 2026
…H-150596) (GH-150609)

* The content of elements "xmp", "iframe", "noembed", "noframes",
  and "plaintext" is no longer escaped.
* The "plaintext" element no longer have the closing tag.
(cherry picked from commit c42e6d3)


(cherry picked from commit bcd29e4)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Comment on lines 930 to +931
if tag is Comment:
write("<!--%s-->" % _escape_cdata(text))
write("<!--%s-->" % text)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit late to the party, but what happens if text contains -->? Shouldn't this be escaped (if there's a way to escape it, I don't think so 🤔) or probably instead be rejected altogether?

write("<!--%s-->" % text)
elif tag is ProcessingInstruction:
write("<?%s?>" % _escape_cdata(text))
write("<?%s?>" % text)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess same question here -- what if text contains ?>?

@scoder
Copy link
Copy Markdown
Contributor

scoder commented Jun 1, 2026

what happens if text contains --> ?

Yes, that was my concern in #149490 (comment) as well. Note that this part of the changes was not backported and will only appear in 3.15+.

The HTML5 spec is explicit about what is not allowed in comments:
https://html.spec.whatwg.org/#comments

It's not clear about what is allowed in processing instructions, but that probably just means that ?> should be rejected and nothing else. PIs are rather relaxed by design.

I think it's worth rejecting non well-formed text. lxml does it in the API when creating a Comment / ProcessingInstruction element, but for xml.etree, it might be more reasonable to reject it in the serialiser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants