Skip to content

HTML API: Plans for 6.9 #63037

@dmsnell

Description

@dmsnell

🔝Block API | HTML issues | Refactors | ↑ Broader Roadmap | ← Plans for 6.7

HTML rule changes.

Trac tickets.

  • Core-63694: Improvements to HTML handling in Core.
  • Core-63738: Improvements to the HTML API itself.

Core improvements in handling HTML

Improvements to the HTML API

Performance

Feature-set

  • Potentially some movement towards inner_html functionality.

Block Scanner

Lingering work from 6.7

⚠️ The tasks in this section likely won’t make it into 6.9 due to pausing of Core work in early 2025. While it could still make it, as the roadmap is revisited other priorities may take place.

  • Speed speed speed. Make the HTML Processor 10x faster.

    • Can we defer parsing and deduplicating attribute names while parsing tags and only start doing that when reading attributes?
      • Potentially around a 3% speed improvement in scanning tokens with the Tag Processor when not interacting with attributes.
    • Remove all if statements that don't execute anything (they have a comment as their body).
    • If 6.7 includes full support for all HTML tags, measure the impact of reordering the case statements in each insertion mode. Test against 100s of 1000s of websites based on web popularity.
    • Profile the parsing of 100s of 1000s of websites and see if anything surprising pops up in the results.
    • Replace '#text' === $token_type with ::STATE_TEXT_NODE === $this->parser_state
    • Eagerly set token name, type in step() where all nodes are real. Reference these values instead of calling ->get_token_name() etc…
    • Remove after_element_push() since these are all instigated from within the HTML Processor, unlike pop with pop_until() (unless we made pop_until() return a generator and we could foreach ( $state->pop_until( 'TAG' ) as $popped )
    • Flagification
      • Replace as many repetitive if checks with flags that are set on events, as is done with has_p_in_button_scope.
      • Indicate once in next_token() if a text node is only whitespace.
  • Following the change to push/pop, immediately pop elements off of the stack of elements as instructed in the parsing rules, vs. letting step() perform the check and pop.

With some initial explorations I've found 16% - 40% speed improvement with some of these ideas. That's not good enough, but it's a start.

Lingering support edge-cases.

New Features and Interfaces

Blocks

  • Continue developing block attribute sourcing.
  • Iterate on the Server Directive Processor.
  • Iterate on the Block Bindings processor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    [Feature] HTML APIAn API for updating HTML attributes in markup[Type] IterationScoped iteration of an effort from a tracking issue or overview issue ideally for a major release.

    Type

    No type

    Projects

    Status

    🦵 Punted to 6.9

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions