Add Blob builtin implementation #169

andreiltd · 2024-10-23T14:34:36Z

This patch adds support for the Blob API.

The current status of platform tests is ~~205~~ 216 passes out of 222 tests. Notable failures include:

Constructor with DOM objects,
BYOB stream support

guybedford

To update the WPT coverage, you need to create a JSON file in https://github.com/bytecodealliance/StarlingMonkey/tree/main/tests/wpt-harness/expectations for the corresponding tests you add, corresponding to the same test name but as .any.js.json instead of .any.js.

These files can be auto-generated with the --update-expectations WPT flag in https://github.com/bytecodealliance/StarlingMonkey/blob/main/tests/wpt-harness/run-wpt.mjs#L79.

WPT flags can be set when doing a manual WPT run via WPT_FLAGS="--update-expectations" which is covered at the end of https://github.com/bytecodealliance/StarlingMonkey?tab=readme-ov-file#web-platform-tests.

tests/wpt-harness/tests.json

If the file name defined in tests.json cannot be found, the fetch function doesn’t throw an error. Instead, it writes the error into the response. Without checking the response status, we proceed to evaluate the script source, which in this case is: `Error: {"error": {"code": 404, "message": "404"}}` At this point, eval throws an Invalid Syntax error without indicating that the file wasn't fetched successfully.

builtins/web/blob.cpp

andreiltd · 2024-11-04T16:13:05Z

Is ReadableStream something we support already or does it depend on: #126 ?

tschneidereit · 2024-11-04T16:17:02Z

Is ReadableStream something we support already or does it depend on: #126 ?

It absolutely is, yes. We have some gaps on byte streams specifically, but Blob#stream() should be fully supportable.

You probably need to make use of NativeStreamSource, which we're currently using for body streams.

builtins/web/blob.cpp

guybedford · 2024-11-04T19:44:22Z

Note that #126 is not a blocker here - rather #126 is tracking additional readable stream features to the ones we already support.

andreiltd · 2024-11-04T20:14:10Z

Note that #126 is not a blocker here - rather #126 is tracking additional readable stream features to the ones we already support.

Right, I saw #126 and jumped to a conclusion. I’ll add the streaming API :)

andreiltd · 2024-11-05T15:47:15Z

Is ReadableStream something we support already or does it depend on: #126 ?

It absolutely is, yes. We have some gaps on byte streams specifically, but Blob#stream() should be fully supportable.

You probably need to make use of NativeStreamSource, which we're currently using for body streams.

Hey @tschneidereit I looked at the body stream implementation, but I'm not sure how to adapt it for a Blob-readable stream. AFAICT, the NativeStreamSource requires a pulling function. In the existing code, this function initiates an async task to pull data. I assume I would need something similar.

The part I'm uncertain about is that the body async task uses a pollable handle to drive the future execution. What would be the equivalent in the case of a Blob?

guybedford · 2024-11-06T22:17:46Z

@andreiltd you can follow the same process as request-response.cpp here in defining a pull algorithm function, but then the algorithm itself can just copy the buffers synchronously down to backpressure. None of these types touch the AsyncTask model system, the only promise comes from the potential backpressure which should be fine to handle in exactly the same way as for request. That is, there is no BodyFutureTask, just a body_source_pull_algorithm, and it should be a fairly degenerate case of this given there is no waiting needed on the incoming buffer.

andreiltd · 2024-11-06T22:35:09Z

Thanks @guybedford . I considered using a task because this is what spec mentions as well: https://w3c.github.io/FileAPI/#blob-get-stream but I probably just misunderstood what needs to be done here :) Thanks for clarification!

guybedford · 2024-11-06T22:49:34Z

Ahh I see, right we have no concept of immediate tasks to add to the task list which do not depend on WASI polls currently.

One option might be to use EnqueueJob to interleave with the Spidermonkey naive task system.

Alternatively we do have the concept of tasks having a deadline(). It might make sense to support a new ImmediateTask subclass of AsyncTask which treats it's handle as always invalid and its deadline as the current time, and then to update the WASI task ::select implementation to resolve such immediate tasks before the main task queue.

Either sounds to me like it would get us semantic correctness in draining the microtasks first before draining tasks, and Fastly's ::select implementation already does this exactly as well.

Perhaps @tschneidereit may have more thoughts on this as well.

andreiltd · 2024-11-26T16:57:54Z

The suggested code seems to work. Thanks!

But it looks like there is something wrong with the streaming for large blobs (>10M). I've added the test that creates a blob and read it in chunks. There is also an interval task running concurrently. This all seems to be working except if I increase the size of the blob to 10M or 100M I get the size mismatch between the amount of bytes read from the stream and actual blob size. The stream size is always larger. I will have to debug this :)

https://github.com/bytecodealliance/StarlingMonkey/actions/runs/12035659289/job/33555162194?pr=169#step:6:30

andreiltd · 2024-11-27T15:13:36Z

But it looks like there is something wrong with the streaming for large blobs (>10M).

I think I’ve got to the root of the problem: I’m storing references to GC-managed objects in the blob slot. This slot is a map that allows lookup of the reader state using the stream pointer.:

  using ReadersMap = js::HashMap<JSObject*, BlobReader, js::PointerHasher<JSObject*>, js::MallocAllocPolicy>;

This long running test triggers a GC run which invalidates the stored pointers and as a result it restarts the reading of the stream. I guess if I want to fix that I need to change the references to Heap<JSObject *> and implement trace on the Blob. This mean I probably need to define something like TraceableBuiltinImpl so that I can connect trace method to the class ops.

EDIT: I would imagine something like my last commit should allow tracing: 0c349f0

tschneidereit

This is really great, thank you so much for the excellent work here!

I left a bunch of comments, most of which should be addressed, but it's very close!

host-apis/wasi-0.2.0/host_api.cpp

builtins/web/blob.cpp

builtins/web/blob.h

tschneidereit

Great job tracking down that issue!

This is very, very close, with just a few more changes needed. Some of those are because I gave unclear feedback earlier—my apologies for that!

builtins/web/blob.cpp

builtins/web/streams/native-stream-source.h

guybedford · 2024-12-05T00:12:34Z

@andreiltd have all the review comments here been addressed now?

andreiltd · 2024-12-05T08:33:42Z

@andreiltd have all the review comments here been addressed now?

I think so, yes.

tschneidereit

Fantastic work, thank you so much! I really appreciate the attention to detail, and the patience in working through the feedback <3

builtins/web/blob.cpp

andreiltd · 2024-12-05T11:39:21Z

Thank you @tschneidereit and @guybedford for all the in-depth feedback! I'm confident that the scars from my first PR have taught me enough to be much more productive with future ones :)

guybedford · 2024-12-05T20:04:55Z

Congrats on a fantastic first PR here 🎉

Add Blob builtin implementation

faa8e6e

andreiltd marked this pull request as draft October 23, 2024 14:34

guybedford reviewed Oct 23, 2024

View reviewed changes

tests/wpt-harness/tests.json Outdated Show resolved Hide resolved

andreiltd added 8 commits October 24, 2024 12:12

Add blob test expectation files

44a4036

Pass more tests.

8270c64

Pass more tests.

1024ec5

Fix slice tests

48df322

Fix default options tests

90d68d4

Fix options evaluation order tests

c64f6a1

Fix calling ToString on blob parts

873c9ef

andreiltd marked this pull request as ready for review November 4, 2024 15:18

andreiltd commented Nov 4, 2024

View reviewed changes

builtins/web/blob.cpp Outdated Show resolved Hide resolved

Simplify appending string types

09c8a17

andreiltd commented Nov 4, 2024

View reviewed changes

builtins/web/blob.cpp Show resolved Hide resolved

tschneidereit reviewed Nov 4, 2024

View reviewed changes

builtins/web/blob.cpp Outdated Show resolved Hide resolved

tschneidereit reviewed Nov 4, 2024

View reviewed changes

builtins/web/blob.cpp Outdated Show resolved Hide resolved

Do not overwrite active exceptions in constructor

715d9e0

Use UTF-16 endcoder from rust-encoding for text method

5bc873f

Fix release build

9a7798a

andreiltd added 2 commits November 7, 2024 17:00

Hack Blob#stream support.

ce08991

Merge remote-tracking branch 'upstream/main' into blob-builtin

3b054c5

Apply code review suggestions

5a56037

andreiltd added 2 commits November 26, 2024 18:07

Catch readAll exception in the blob test

504fde6

Print assertion error

a0cf051

andreiltd added 5 commits November 27, 2024 20:01

Add TraceableBuiltinImpl class

0c349f0

Use GCHashTable for storing the readers

417ff38

Tweak a blob test

7e3be60

Print diff content on test failure

52bd455

Wait for interval to finish in the blob test.

1358e94

tschneidereit requested changes Nov 28, 2024

View reviewed changes

andreiltd added 6 commits November 28, 2024 17:39

Address code review comments.

924edb9

Store ReadableDefaultStream in NativeStream slot

745f80c

Initialize underlying source slots before creating a default stream

518816a

Check if readable is object in transform-stream

e414aa0

Fix slot getters

16aa50d

Add comment explaining order of initialization in native stream source

d755a52

tschneidereit requested changes Dec 2, 2024

View reviewed changes

andreiltd added 4 commits December 2, 2024 16:27

Make slot getters to use JSObject instead of HandleObject

bd82b0e

Runt clang-format

8a18cc5

Enable stream assert for testing

2d48a32

Remove default_stream method from NativeStreamSource

4f6c47b

andreiltd mentioned this pull request Dec 4, 2024

Add File interface #181

Open

tschneidereit approved these changes Dec 5, 2024

View reviewed changes

builtins/web/blob.cpp Outdated Show resolved Hide resolved

Simplify blob_size method

ca68060

guybedford merged commit 0749b09 into bytecodealliance:main Dec 5, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Blob builtin implementation #169

Add Blob builtin implementation #169

andreiltd commented Oct 23, 2024 •

edited

Loading

guybedford left a comment

andreiltd commented Nov 4, 2024

tschneidereit commented Nov 4, 2024

guybedford commented Nov 4, 2024

andreiltd commented Nov 4, 2024

andreiltd commented Nov 5, 2024 •

edited

Loading

guybedford commented Nov 6, 2024

andreiltd commented Nov 6, 2024

guybedford commented Nov 6, 2024

andreiltd commented Nov 26, 2024 •

edited

Loading

andreiltd commented Nov 27, 2024 •

edited

Loading

tschneidereit left a comment

tschneidereit left a comment

guybedford commented Dec 5, 2024

andreiltd commented Dec 5, 2024

tschneidereit left a comment

andreiltd commented Dec 5, 2024

guybedford commented Dec 5, 2024

Add Blob builtin implementation #169

Add Blob builtin implementation #169

Conversation

andreiltd commented Oct 23, 2024 • edited Loading

guybedford left a comment

Choose a reason for hiding this comment

andreiltd commented Nov 4, 2024

tschneidereit commented Nov 4, 2024

guybedford commented Nov 4, 2024

andreiltd commented Nov 4, 2024

andreiltd commented Nov 5, 2024 • edited Loading

guybedford commented Nov 6, 2024

andreiltd commented Nov 6, 2024

guybedford commented Nov 6, 2024

andreiltd commented Nov 26, 2024 • edited Loading

andreiltd commented Nov 27, 2024 • edited Loading

tschneidereit left a comment

Choose a reason for hiding this comment

tschneidereit left a comment

Choose a reason for hiding this comment

guybedford commented Dec 5, 2024

andreiltd commented Dec 5, 2024

tschneidereit left a comment

Choose a reason for hiding this comment

andreiltd commented Dec 5, 2024

guybedford commented Dec 5, 2024

andreiltd commented Oct 23, 2024 •

edited

Loading

andreiltd commented Nov 5, 2024 •

edited

Loading

andreiltd commented Nov 26, 2024 •

edited

Loading

andreiltd commented Nov 27, 2024 •

edited

Loading