-
Notifications
You must be signed in to change notification settings - Fork 16
[0.3.0-draft]: Move away from streams as an API? #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
One other thing I can mention: from my time working on Rust we at the time discovered that there's no way to portably write bytes across platforms to stdout/stderr. Fundamentally on Windows at the time it was UCS-2 which is not what most source languages expect (they just write bytes). In Rust that means that writing to stdout on Windows implicitly converts all bytes to For WASI we may want to consider changing stdout/stderr/stdin from being byte-based to being string-based, which would also necessitate a change from |
note that nodejs has functions for async writes to stdout/stderr. also, on unix it's common for commands to write binary to stdout, e.g. writing a raw gzip file to stdout: jq < my/json-file.json | gzip > pretty.json.gz |
I'd have to double-check, but my hunch is that the async read/write in Node.js is a "lie" in the sense that it goes out to a thread pool and it's blocking there. If things haven't changed from ~10 years ago (which I realize that's a little far-fetched) Node is built on libuv and Rust also used to use libuv. I remember libuv specifically treating stdio and not turning file descrptors nonblocking and farming reads/writes out to a thread pool. Also I understand that commands are often piped to one another, that's done and works on Windows too! The problem is writing to the console on Windows. If things haven't changed from whence I last looked, there's no way to write binary data to a console on Windows, it's required to be a UCS-2 (aka a sequence of 16-bit integers, aka UTF-16 often reencoded from UTF-8). Note that this is different from processes piping output between each other, that's what enables piping binary data between processes. |
I agree with the spirit of this change! Using streams for stdin/out/err is much more complicated than the reality, which is that programs need to print bytes to the terminal so humans understand what happened. Unix pipes should be subsumed by other modes of component composition: we no longer want to use stdin/out/err to communicate between components, just from components to humans.
This may end up being too radical, but I think its worth considering if there's some point in the design space that could accommodate this the concern that "stdin/out/err are just for communicating with humans" and also that "those humans may be using a tty" and so colors and other control codes are expected to work. WASI 0.2 does expose isatty (in a roundabout manner that never got fleshed out). Can we get an interface as trivial as |
I talked some more about this yesterday with @dicej, @yoshuawuyts, and @vados-cosmonic, and wanted to write down some thoughts from that. Overall they helped me clarify what I'm trying to do here, which is two-fold:
The first point is relatively self-explanatory where a "simple" function is simpler than streams, but the second point is relatively nuanced. Namely with the component model async situation async is not part of a function's type, meaning there are not "colors" in component model async. Thus it's possible to call any function asynchronously just as well as calling it synchronously. This means that regardless of annotations in WIT it's actually possible to still use these functions in a synchronous or asynchronous manner. Given that I don't actually think that there's any loss in functionality over what we have today with WASIp2. The main change is the default expectation of what these APIs are doing. For me this would be a signal that the default expectation should be the use of synchronous APIs for writing and asynchronous for reading. Scenarios which require async, though, are still possible! For example one particular library could virtualize stdio with asynchronous interfaces and then if a language imported everything as asynchronous then everything would cooperate and work well together.
No I think you're right, I don't think it's worth going down a string-based route. |
The proposed changes make sense to me too for the upcoming 0.3.0 release. Longer term (in some later 0.3.x release leading up to 1.0-rc), I was thinking we might want to do the following and, since Alex mentioned thinking about 1.0 above, I'd be curious what folks thought. We could introduce a new interface main {
main: async func(args: list<string>, stdin: stream<u8>) -> stream<u8, _, error-code>;
} and it would be the linker's injected start-thunk's job to set up a standard (synchronous) C/C++ The point of this change would be that any language could create Unix-style pipelines as easily as in bash. E.g., in JS, something like this could work: import { main as f1 } from './f1.wasm';
import { main as f2 } from './f2.wasm';
import { main as f3 } from './f3.wasm';
let result = f1(args1, f2(args2, f3(args3, (await fetch('data.json').body))); which I think would make With stderr is missing from the above because, for it, I agree with Alex's comment about it being a simple synchronous function. Separately, I think there are a number of good reasons (perf and DX) to add a WDYT? |
i was just discussing |
I have no stake in this game and either solution will probably work fine, but since you were openly solliciting feedback on Zulip, here it goes: 😄 I'm not sure I fully understand the need to change the interface. The CM is "colorblind" so regardless of the WIT syntax, it can be called synchronously and asynchronously, and implementations are going to need to deal with that anyway, right?
I see your point, but the same used to apply to File I/O, until io_uring came along. As matter of fact, the first (and only) example on https://man7.org/linux/man-pages/man7/io_uring.7.html is about asynchronously piping stdin to stdout. I think part of the problem is that the usage of stdin/out/err is too overloaded.
I like Luke's idea of having a dedicated logging mechanism. I think that also plays into your point regarding:
With that in place, I'd say:
And in some environments, the logs are redirected to stdout/err.
Aside from Node.JS, .NET's
I haven't double checked with the libuv source code, but the current Node.JS docs mention:
Which seems to imply that writes may truly be asynchronous nowadays (depending on conditions). |
Thanks that's all definitely quite valuable!
You're right, but there's also, in my opinion, a lot less ceremony around "call the function with a list" vs working with streams. I'm still working through bindings generation in Rust for example but bindings for The color-blind-ness means that if both the producer and consumer are "power users" then everything works out, as well as if one isn't a power user it still works just not as efficiently as otherwise.
In my mind this sort of support is mostly related to guest languages where the logging builtins are always available but are conditionally not used in CLIs when the stream-based For today the "stdio is just free functions" idea is more-or-less the future stand-in for "logging builtins" where we'd one day remove the WASIp3 stdio interfaces and replace them with logging builtins. In that sense just-a-free-function better matches the possible future state of builtins.
Many I'm really showing my lack of expertise here... Also man that matrix of possibilities in libuv is quite something, I had no idea! |
I'm in favor of simplifying the interface by moving away from streams. For reference, this is what it looks like to write to STDERR in Rust using
That said, however, I think we still should mark all stdio as
|
Personally, in Wasmtime at least, I want to make stdio blocking-by-default and not async. In my experience stdio is a critical debugging utility and adding infrastructure/complications to that risks masking bugs or making things more difficult. For example if Rust were to implement async stdio but C were to implement sync stdio, it might be possible to see prints in reverse order depending exactly how the code looks. This is added on top of the fact that To clarify though guests/hosts can, as you point out, still do whatever they want. I expect tokio to import async versions of stdio, and guests using sync can use sync regardless of how the API is tagged. In that sense this truly is a question of purely defaults, and that's where I would personally prefer to default on the side of simplicity -- the API doesn't have In terms of simplicity and hello world, what we're working with is: (module
;; sync import of `write: func(bytes: list<u8>) -> result;`
(import "cm32p2|wasi:cli/[email protected]" "write" (func (param i32 i32) (result i32))) ;; standard mangling
(import "wasi:cli/[email protected]" "write" (func (param i32 i32) (result i32))) ;; non-standard mangling
;; sync import of `write: async func(bytes: list<u8>) -> result;`
(import "cm32p2|wasi:cli/[email protected]" "[async]write" (func (param i32 i32) (result i32))) ;; standard (?) mangling
(import "wasi:cli/[email protected]" "[async]write" (func (param i32 i32) (result i32))) ;; non-standard mangling
) I realize it's a little ridiculous to optimize for hello world, but in some sense I also see this as important. It's surprising that you import an |
I'd expect for Rust standard library, most standard libraries, in fact, to assume sync stdio. async stdio would be for "power-users".
In case that stdio is blocking by default (i.e. async I/O is non-standard), it seems more likely that Tokio would not support stdio on I'm trying to point out that async stdio default in runtimes would provide the most compatibility, whereas sync stdio default in runtimes would limit it and prevent guest libraries from depending on it, thus limiting the potential set of "lift and shift" scenarios "componentizing" existing applications. It appears that the right way to signal the async-by-default stdio is adding the Alternatively, since both hosts and guests are free to do whatever they want anyway, how about we document the fact that stdio is expected to be implemented async, and so guest code can depend on the fact that stdio is async, if it wants to, without adding |
I agree that async stdio in Wasmtime would be the most flexible, but it's also the most complicated to implement. My hunch is that 90%+ of users of stdio use sync anyway, so all the work done to make it async would be generally lost anyway. Today it's all sync anyway, so this would be fresh work for wasmtime-wasi. The async nature means that if two components are composed together and they happen to both use async it'll all work out, but for the host Wasmtime will be synchronous. The one possible thing I can think of is that Wasmtime multiplexes "the native host stdio" and "embedder-supplied stdio" where the latter could be async-by-default and the former is sync-by-default. Effectively what I want to reach for is the opposite of what you're thinking, I want to document these functions as sync and recommend everyone uses the sync versions. The async versions are always there for scenarios like Tokio, and the fact that the Rust guest thinks it's doing async stdio with Tokio when it's actually blocking is something we can either configure in Wasmtime at some point or do something else about. |
I'd like to propose a possible radical restructuring of stdin/stdout/stderr and how they're modeled for 0.3.0. Specifically something like this for 0.3.0:
Specifically this would do away with streams entirely and instead focus on just bytes. The
stdin
interface is tagged asasync
indicating that it will block waiting for input and bindings generators might want to do future-y things, but they can of course opt-out of that as well. Forwrite
and stdout/stderr they're not tagged asasync
and semantically they block the program while the write is happening.This is naturally very different from both 0.2.0 and the current 0.3.0-draft, so I'll try to motivate why I'd propose this design instead:
set-stdout
/set-stderr
overwrite each other? #64 as well for 0.3.0 and I also feel like it's asking a lot to have so much extra runtime support code just to write to stdout/stderr in language standard libraries. Overall I've always had the feeling that stdout/stderr management is causing a lot of headaches and integration questions when at the end of the day everyone typically wants something much simpler that's along the lines of "please just print this".IIRC @pchickey and I basically concluded during 0.2.0 that we'd just come back to this at some point before 1.0.0 and rethink stdio. Personally I think now's a good as time as any as we transition from 0.2.0 to 0.3.0. There's of course downsides to the above such as "piping" is less obvious than before or redirection, but so far I'm not aware of any guest language which would support that. This could hypothetically be added in the future but I think it'd be best to start with a simple
write
andread
function if we can.The text was updated successfully, but these errors were encountered: