Skip to content

Invalid ID attribute with php8.4's \Dom\HTMLDocument #18316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
edent opened this issue Apr 12, 2025 · 2 comments
Closed

Invalid ID attribute with php8.4's \Dom\HTMLDocument #18316

edent opened this issue Apr 12, 2025 · 2 comments

Comments

@edent
Copy link

edent commented Apr 12, 2025

Description

The following code:

<?php
$html = `<p id="example ">`
$dom = \Dom\HTMLDocument::createFromString($html, LIBXML_NOERROR, "UTF-8");
echo $dom->saveHTML();

Resulted in this output:

<html><head></head><body><p id="example "></p></body></html>

But I expected this output instead:

<html><head></head><body><p id="example"></p></body></html>

As per https://html.spec.whatwg.org/multipage/dom.html#global-attributes:the-id-attribute-2

When specified on HTML elements, the id attribute value must be unique amongst all the IDs in the element's tree and must contain at least one character. The value must not contain any ASCII whitespace.

Further detail at https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Global_attributes/id#syntax

I'd suggest trimming whitespace from the IDs. It may also be sensible to do that on the other attributes. Of course, the behaviour does depend on how closely you want to follow the input's mistakes. If the intention is to closely replicate the input (whether their original code was valid or not) then please close this bug.

PHP Version

PHP 8.4.6 (cli) (built: Apr 11 2025 02:19:14) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.4.6, Copyright (c) Zend Technologies
with Zend OPcache v8.4.6, Copyright (c), by Zend Technologies

Operating System

Pop!_OS 22.04 LTS

@nielsdos
Copy link
Member

The docs you cited describe how the developer should author HTML documents, not how it should be parsed.
The parser spec does not contain a rule to strip the whitespace and diverging from that would be dangerous.
Browsers also don't do this:

dom=(new DOMParser).parseFromString('<p id="example ">', 'text/html')
dom.querySelector('p') // <p id="example ">
dom.querySelector('p').id // "example "

@nielsdos nielsdos closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2025
@edent
Copy link
Author

edent commented Apr 13, 2025

Thank you - that's a very clear explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants