Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Blob builtin implementation #169

Merged
merged 42 commits into from
Dec 5, 2024

Conversation

andreiltd
Copy link
Contributor

@andreiltd andreiltd commented Oct 23, 2024

This patch adds support for the Blob API.

The current status of platform tests is 205 216 passes out of 222 tests. Notable failures include:

  • Constructor with DOM objects,
  • BYOB stream support

@andreiltd andreiltd marked this pull request as draft October 23, 2024 14:34
Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To update the WPT coverage, you need to create a JSON file in https://github.com/bytecodealliance/StarlingMonkey/tree/main/tests/wpt-harness/expectations for the corresponding tests you add, corresponding to the same test name but as .any.js.json instead of .any.js.

These files can be auto-generated with the --update-expectations WPT flag in https://github.com/bytecodealliance/StarlingMonkey/blob/main/tests/wpt-harness/run-wpt.mjs#L79.

WPT flags can be set when doing a manual WPT run via WPT_FLAGS="--update-expectations" which is covered at the end of https://github.com/bytecodealliance/StarlingMonkey?tab=readme-ov-file#web-platform-tests.

tests/wpt-harness/tests.json Outdated Show resolved Hide resolved
If the file name defined in tests.json cannot be found, the fetch
function doesn’t throw an error. Instead, it writes the error into the
response. Without checking the response status, we proceed to evaluate
the script source, which in this case is:

`Error: {"error": {"code": 404, "message": "404"}}`

At this point, eval throws an Invalid Syntax error without indicating
that the file wasn't fetched successfully.
@andreiltd andreiltd marked this pull request as ready for review November 4, 2024 15:18
builtins/web/blob.cpp Outdated Show resolved Hide resolved
@andreiltd
Copy link
Contributor Author

Is ReadableStream something we support already or does it depend on: #126 ?

@tschneidereit
Copy link
Member

Is ReadableStream something we support already or does it depend on: #126 ?

It absolutely is, yes. We have some gaps on byte streams specifically, but Blob#stream() should be fully supportable.

You probably need to make use of NativeStreamSource, which we're currently using for body streams.

builtins/web/blob.cpp Outdated Show resolved Hide resolved
builtins/web/blob.cpp Outdated Show resolved Hide resolved
@guybedford
Copy link
Contributor

Note that #126 is not a blocker here - rather #126 is tracking additional readable stream features to the ones we already support.

@andreiltd
Copy link
Contributor Author

Note that #126 is not a blocker here - rather #126 is tracking additional readable stream features to the ones we already support.

Right, I saw #126 and jumped to a conclusion. I’ll add the streaming API :)

@andreiltd
Copy link
Contributor Author

andreiltd commented Nov 5, 2024

Is ReadableStream something we support already or does it depend on: #126 ?

It absolutely is, yes. We have some gaps on byte streams specifically, but Blob#stream() should be fully supportable.

You probably need to make use of NativeStreamSource, which we're currently using for body streams.

Hey @tschneidereit I looked at the body stream implementation, but I'm not sure how to adapt it for a Blob-readable stream. AFAICT, the NativeStreamSource requires a pulling function. In the existing code, this function initiates an async task to pull data. I assume I would need something similar.

The part I'm uncertain about is that the body async task uses a pollable handle to drive the future execution. What would be the equivalent in the case of a Blob?

@guybedford
Copy link
Contributor

@andreiltd you can follow the same process as request-response.cpp here in defining a pull algorithm function, but then the algorithm itself can just copy the buffers synchronously down to backpressure. None of these types touch the AsyncTask model system, the only promise comes from the potential backpressure which should be fine to handle in exactly the same way as for request. That is, there is no BodyFutureTask, just a body_source_pull_algorithm, and it should be a fairly degenerate case of this given there is no waiting needed on the incoming buffer.

@andreiltd
Copy link
Contributor Author

Thanks @guybedford . I considered using a task because this is what spec mentions as well: https://w3c.github.io/FileAPI/#blob-get-stream but I probably just misunderstood what needs to be done here :) Thanks for clarification!

@guybedford
Copy link
Contributor

Ahh I see, right we have no concept of immediate tasks to add to the task list which do not depend on WASI polls currently.

One option might be to use EnqueueJob to interleave with the Spidermonkey naive task system.

Alternatively we do have the concept of tasks having a deadline(). It might make sense to support a new ImmediateTask subclass of AsyncTask which treats it's handle as always invalid and its deadline as the current time, and then to update the WASI task ::select implementation to resolve such immediate tasks before the main task queue.

Either sounds to me like it would get us semantic correctness in draining the microtasks first before draining tasks, and Fastly's ::select implementation already does this exactly as well.

Perhaps @tschneidereit may have more thoughts on this as well.

@andreiltd
Copy link
Contributor Author

andreiltd commented Nov 26, 2024

The suggested code seems to work. Thanks!

But it looks like there is something wrong with the streaming for large blobs (>10M). I've added the test that creates a blob and read it in chunks. There is also an interval task running concurrently. This all seems to be working except if I increase the size of the blob to 10M or 100M I get the size mismatch between the amount of bytes read from the stream and actual blob size. The stream size is always larger. I will have to debug this :)

https://github.com/bytecodealliance/StarlingMonkey/actions/runs/12035659289/job/33555162194?pr=169#step:6:30

@andreiltd
Copy link
Contributor Author

andreiltd commented Nov 27, 2024

But it looks like there is something wrong with the streaming for large blobs (>10M).

I think I’ve got to the root of the problem: I’m storing references to GC-managed objects in the blob slot. This slot is a map that allows lookup of the reader state using the stream pointer.:

  using ReadersMap = js::HashMap<JSObject*, BlobReader, js::PointerHasher<JSObject*>, js::MallocAllocPolicy>;

This long running test triggers a GC run which invalidates the stored pointers and as a result it restarts the reading of the stream. I guess if I want to fix that I need to change the references to Heap<JSObject *> and implement trace on the Blob. This mean I probably need to define something like TraceableBuiltinImpl so that I can connect trace method to the class ops.

EDIT: I would imagine something like my last commit should allow tracing: 0c349f0

Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great, thank you so much for the excellent work here!

I left a bunch of comments, most of which should be addressed, but it's very close!

host-apis/wasi-0.2.0/host_api.cpp Outdated Show resolved Hide resolved
builtins/web/blob.cpp Outdated Show resolved Hide resolved
builtins/web/blob.cpp Outdated Show resolved Hide resolved
builtins/web/blob.cpp Outdated Show resolved Hide resolved
builtins/web/blob.cpp Outdated Show resolved Hide resolved
builtins/web/blob.cpp Outdated Show resolved Hide resolved
builtins/web/blob.cpp Outdated Show resolved Hide resolved
builtins/web/blob.cpp Outdated Show resolved Hide resolved
builtins/web/blob.cpp Show resolved Hide resolved
builtins/web/blob.h Show resolved Hide resolved
Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job tracking down that issue!

This is very, very close, with just a few more changes needed. Some of those are because I gave unclear feedback earlier—my apologies for that!

builtins/web/blob.cpp Outdated Show resolved Hide resolved
builtins/web/blob.cpp Outdated Show resolved Hide resolved
builtins/web/blob.cpp Outdated Show resolved Hide resolved
builtins/web/streams/native-stream-source.h Show resolved Hide resolved
builtins/web/streams/native-stream-source.h Outdated Show resolved Hide resolved
@andreiltd andreiltd mentioned this pull request Dec 4, 2024
@guybedford
Copy link
Contributor

@andreiltd have all the review comments here been addressed now?

@andreiltd
Copy link
Contributor Author

@andreiltd have all the review comments here been addressed now?

I think so, yes.

Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work, thank you so much! I really appreciate the attention to detail, and the patience in working through the feedback <3

builtins/web/blob.cpp Outdated Show resolved Hide resolved
@andreiltd
Copy link
Contributor Author

Thank you @tschneidereit and @guybedford for all the in-depth feedback! I'm confident that the scars from my first PR have taught me enough to be much more productive with future ones :)

@guybedford guybedford merged commit 0749b09 into bytecodealliance:main Dec 5, 2024
5 checks passed
@guybedford
Copy link
Contributor

Congrats on a fantastic first PR here 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants