-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle decoding of input in html5ever
#590
base: main
Are you sure you want to change the base?
Conversation
fff1186
to
f4d8f88
Compare
730049b
to
7f1f591
Compare
Companion PR for servo/html5ever#591 Testing: Covered by WPT Part of #6414, #24898, preparation for servo/html5ever#590 --------- Signed-off-by: Simon Wülker <[email protected]>
Companion PR for servo/html5ever#591 Testing: Covered by WPT Part of #6414, #24898, preparation for servo/html5ever#590 --------- Signed-off-by: Simon Wülker <[email protected]>
Companion PR for servo/html5ever#591 Testing: Covered by WPT Part of #6414, #24898, preparation for servo/html5ever#590 --------- Signed-off-by: Simon Wülker <[email protected]>
…36284) Companion PR for servo/html5ever#591 Testing: Covered by WPT Part of servo#6414, servo#24898, preparation for servo/html5ever#590 --------- Signed-off-by: Simon Wülker <[email protected]>
6bd5a43
to
676cd9b
Compare
Signed-off-by: Simon Wülker <[email protected]>
I am running into architectural issues with this approach, because the prefetch tokenizer needs to run on the same input as the parser, but it shouldn't need to independently decode the incoming bytes. And since the prefetch tokenizer lives in servo, that's going to be difficult when when we decode input in html5ever. |
These changes are an attempt to allow users of
html5ever
to respect the encodings specified with<meta charset="...">
tags in a spec-compliant way.The major change is that the https://html.spec.whatwg.org/#input-stream now lives in the html5ever crates (if the decoding wrapper around the tokenizer is used). As a result, the new API surface exposes a "pull" instead of the existing "push" interface.
The entry point to the new API is a
DecodingParser
, which wraps either a HTML or an XML parser.After providing some amount of byte input to a
DecodingParser
, the user can callDecodingParser::parse
, which returns an iterator overParserAction
s. A parser action is either a<script>
tag that needs to be executed or a new encoding that the document should be re-parsed with. The caller can drive the parser by repeatedly advancing this iterator.The old API is fully preserved, without breaking changes (that I'm aware of).
This is a draft because the design is not final and this needs a companion servo PR to verify the correctness of these changes. Initial feedback is welcome.
Depends on #591.