-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Open
Labels
[Feature] HTML APIAn API for updating HTML attributes in markupAn API for updating HTML attributes in markup[Type] IterationScoped iteration of an effort from a tracking issue or overview issue ideally for a major release.Scoped iteration of an effort from a tracking issue or overview issue ideally for a major release.
Description
🔝Block API | HTML issues | Refactors | ↑ Broader Roadmap | ← Plans for 6.7
HTML rule changes.
Trac tickets.
- Core-63694: Improvements to HTML handling in Core.
- Core-63738: Improvements to the HTML API itself.
Core improvements in handling HTML
- HTML API: Refactor
wp_kses_hair()
wordpress-develop#9248 - HTML API: Reliably parse HTML in
wp_html_split()
. wordpress-develop#9270 - HTML API: Reliably parse HTML in
wp_strip_all_tags()
. wordpress-develop#9271- This work probably won’t make it in due to deeper coupling between the use of
wp_strip_all_tags()
and CSS content, which is not HTML and actually will be corrupted if treated as such.
- This work probably won’t make it in due to deeper coupling between the use of
- HTML API: Reliably parse HTML in
get_url_in_content()
wordpress-develop#9272
Improvements to the HTML API
- HTML API: Update customizable <select> parsing wordpress-develop#9298
- Add private annotation to the WP_HTML_Doctype_Info class wordpress-develop#9301
Performance
- Convert bookmark names to numeric indices in a linear array to avoid the overhead of string hashing.
- HTML API: Reduce skip_script_data length checks wordpress-develop#9230
- HTML API: Simplify META tag encoding processing wordpress-develop#9231
Feature-set
- Potentially some movement towards
inner_html
functionality.
Block Scanner
- Blocks: Introduce
WP_Block_Scanner
for efficiently parsing blocks. wordpress-develop#9105- Based on the HTML API design, a memory-efficient method for scanning through a document and inferring block structure.
Lingering work from 6.7
⚠️ The tasks in this section likely won’t make it into 6.9 due to pausing of Core work in early 2025. While it could still make it, as the roadmap is revisited other priorities may take place.
-
Speed speed speed. Make the HTML Processor 10x faster.
- Can we defer parsing and deduplicating attribute names while parsing tags and only start doing that when reading attributes?
- Potentially around a 3% speed improvement in scanning tokens with the Tag Processor when not interacting with attributes.
- Remove all
if
statements that don't execute anything (they have a comment as their body). - If 6.7 includes full support for all HTML tags, measure the impact of reordering the
case
statements in each insertion mode. Test against 100s of 1000s of websites based on web popularity. - Profile the parsing of 100s of 1000s of websites and see if anything surprising pops up in the results.
- Replace
'#text' === $token_type
with::STATE_TEXT_NODE === $this->parser_state
- Eagerly set token name, type in
step()
where all nodes are real. Reference these values instead of calling->get_token_name()
etc… - Remove
after_element_push()
since these are all instigated from within the HTML Processor, unlike pop withpop_until()
(unless we madepop_until()
return a generator and we couldforeach ( $state->pop_until( 'TAG' ) as $popped )
- Flagification
- Replace as many repetitive
if
checks with flags that are set on events, as is done withhas_p_in_button_scope
. - Indicate once in
next_token()
if a text node is only whitespace.
- Replace as many repetitive
- Can we defer parsing and deduplicating attribute names while parsing tags and only start doing that when reading attributes?
-
Following the change to push/pop, immediately pop elements off of the stack of elements as instructed in the parsing rules, vs. letting
step()
perform the check and pop.
With some initial explorations I've found 16% - 40% speed improvement with some of these ideas. That's not good enough, but it's a start.
Lingering support edge-cases.
- HTML API: Handle content after BODY, HTML where possible wordpress-develop#7312
- HTML API: Implement active format reconstruction wordpress-develop#6982 and HTML API: Active format reconstruction with noah's ark dmsnell/wordpress-develop#19
- HTML API: Improve implementation of adoption agency algorithm wordpress-develop#6983
- Audit all
seek()
calls in the HTML Processor to ensure reliability. (Core-????)
May be covered by HTML API: Ensure that full processor can seek to earlier bookmarks.- Ensure that internal state isn't messed up:
- breadcrumbs
- FORM pointer
- HEAD pointer
- virtual nodes
- list of active formatting elements
- "strip last newline at" pointers
- Ensure that internal state isn't messed up:
New Features and Interfaces
- Introduce safe composable HTML templating.
- Inner and outer HTML support. (Core-????)
Blocks
- Continue developing block attribute sourcing.
- Iterate on the Server Directive Processor.
- Iterate on the Block Bindings processor.
Metadata
Metadata
Assignees
Labels
[Feature] HTML APIAn API for updating HTML attributes in markupAn API for updating HTML attributes in markup[Type] IterationScoped iteration of an effort from a tracking issue or overview issue ideally for a major release.Scoped iteration of an effort from a tracking issue or overview issue ideally for a major release.
Type
Projects
Status
🦵 Punted to 6.9