-
Notifications
You must be signed in to change notification settings - Fork 111
Description
After working on a javascript-based component model runtime and a port of the full go standard library to wasip3, I would like to propose making stream and future cancellation semantics guaranteed synchronous, and I would also like to raise some concerns about subtask cancellation guarantees.
For context, the Javascript runtime optionally targets non-JSPI browsers, meaning none of the component model intrinsics can block, and the go guest successfully runs in that mode. To my surprise, the rust witgen guest did not because it relied on blocking stream and task cancellations. Non-JSPI mode should become irrelevant when all browsers support it later this year thanks to interop, but I still wonder if blocking semantics should change, because I think waiting-on-implicit-cancel is confusing behavior. I also asked about these semantics on this Zulip thread.
Synchronous cancellations for streams and futures
My examples here are the rust bindings, but I think this applies to all languages that want to support cancelling reads or writes. With the component model, if you want to read or write from a stream you pass a buffer to the host, and you get a handle back to the operation, which in rust becomes a future. For safety, the bindings must invoke synchronous cancellation when cancelling (dropping) the future. Since dropping the future is something that happens implicitly, this means the component might pause at surprising times. Today, the wasmtime implementation (and wcjs) cancel quickly, but nothing guarantees it. What if the spec enforced it, and made hosts responsible for ensuring quick cancellation?
The tension, which Alex mentioned in #617, comes from the combination of readiness-based IO (rust) and completion based IO (component model). But I do not think that tension is required, in practice, because hosts today using epoll or whatever are using readiness-based IO and so they can implement synchronous cancellation at (close to) no cost, codifying the guarantee that the rust guest today already relies on.
I think, stepping back a bit, almost all languages today have bindings that fit nicely on readiness-based IO because that is what operating systems offer, and completion based API (like the bindings in eg. tokio-uring) have a different API to handle the tension between the language and underlying API semantics. I think the component model should offer an API that is close to what languages expect today. If, in the future, hosts adopt completion based IO and the cancellation overhead or semantics become a problem for wasm hosts, then the component model could introduce asynchronous-cancellable operations. But that complexity does not seem necessary today.
A counter argument is, perhaps, that completion based IO is the future, and relying in the guest on fast-synchronous-cancel is making some assumptions about host behavior, but those are accurate assumptions. I would prefer that to be guaranteed by the spec.
Concretely, I would propose dropping the async option from the stream and future cancel operations, and guaranteeing that after cancel returns the buffers can be used by the guest. I think that code-wise nothing changes in wasmtime -- it just follows the blocking code path -- but any future host bindings that would require (long) blocking cancellation must instead use some buffering or copying.
Synchronous cancellation for sub-tasks
All the same concerns from above also apply to cancelling subtasks. I think there is a similar tension between what languages model handle today and what the component model exposes. If you lend out a handle to some sub-task, you must wait for that sub-task to return before you can, say, drop that handle. But there is no way to enforce an explicit wait on the future in eg. the rust bindings, and so instead the drop must await implicitly. I think this will cause issues in, for example, any code that races two futures where one does not handle cancellation gracefully. The wasmtime wasip3 host bindings behave nicely and do support cancelling network connects or file reads, so waiting for cancellation on drop works out fine, but once again there are no guarantees.
I think there a fewer concrete patterns that exist in languages or APIs today that I can map this to. Cancellation for long-running operations in most languages requires some kind of external interrupt (an AbortController in javascript, a context.Context in go, a CancellationToken in rust+tokio), and you have to explicitly wait for those to complete. Nothing forces you wait (a goroutine will keep running unless explicitly in a waitgroup; rust and javascript require explicit await). In javascript forgetting the promise means it keep going, while in rust dropping the future cancels the task and cancellation safety is a real issue. In many programs explicitly waiting on tasks has given me issues because it is easy to forget to check a cancellation token in all paths... but at least it was explicit that I would have to wait.
None of this really applies to the operations exposed by wasip3 because they are safely cancellable, will cancel quickly on all implementations, and the language bindings do not show up in the same way because cancellation semantics are mostly around eg. connecting a socket which already comes with a timeout in most languages.
I am not quite sure what I would advocate here. I think implicit and unexpected blocking is going to cause problems, but for safety I am also convinced the caller must wait if you have these borrow semantics. The one thing I could think of is then getting rid of borrowing in some way or another. As long as you do not borrow anything from the calling stack, you can have the operation keep going in the background. You could instead just give ownership of the handle; or make a copy; or make a revocable handle that you can cancel instantaneously. None of those seem like perfect or small changes.
Perhaps revocable handles is what I would find the most clean way out, mostly because it maps to what I have used in go, where you can close a socket or file and any pending reads fail after that. But it would need some host binding representation, and it would introduce an API difference between an owned resource and a borrowed resource. The copying alternative is similar to today's wasip3 TCP read and write stream semantics that keep a socket alive even if the socket is dropped. It would change the surprising behavior on cancelling from blocking to a task still keeping going with the resource you shared.
Curious to hear what people think. Thanks for reading.