Skip to content

[0.3.0-draft]: Move away from streams as an API? #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
alexcrichton opened this issue Mar 6, 2025 · 13 comments
Open

[0.3.0-draft]: Move away from streams as an API? #65

alexcrichton opened this issue Mar 6, 2025 · 13 comments

Comments

@alexcrichton
Copy link
Contributor

I'd like to propose a possible radical restructuring of stdin/stdout/stderr and how they're modeled for 0.3.0. Specifically something like this for 0.3.0:

interface stdin {
  read: async func(amount: u32) -> result<list<u8>>;
}

interface stdout {
  write: func(data: list<u8>) -> result;
}

interface stderr {
  write: func(data: list<u8>) -> result;
}

Specifically this would do away with streams entirely and instead focus on just bytes. The stdin interface is tagged as async indicating that it will block waiting for input and bindings generators might want to do future-y things, but they can of course opt-out of that as well. For write and stdout/stderr they're not tagged as async and semantically they block the program while the write is happening.

This is naturally very different from both 0.2.0 and the current 0.3.0-draft, so I'll try to motivate why I'd propose this design instead:

  • At the OS level stdout/stderr are not async. AFAIK it's just not possible on Windows and on Unix no one does it as turning your end nonblocking automatically turns the other end nonblocking which is basically never expected nor what you want. That means that as an "abstraction over what platforms provide" pretending output is async is already a bit of a lie.
  • Printing to stdout/stderr is quite common but also low-level at the same time. Every language has some facility for printing to the screen and printing to stdout/stderr. No language I'm aware of exposes this as an async interface or as a stream. What I'm proposing here I feel better matches source languages where "just print the stuff before I keep going" is what's desired with these interfaces most of the time.
  • Historically when implementing 0.2.0 and what I'm forseeing with 0.3.0 interfaces is a lot of tricky questions. With 0.2.0 we went back-and-forth about what to do about stdout/stderr and how to implement them in Wasmtime. In the end we skipped the async part of streams entirely and the native implementations just block and are "always ready". I'll note stdin is a bit special but I don't think we can get away from that, so this is mostly 0.3.0. I see [0.3.0-draft] Can calls to set-stdout/set-stderr overwrite each other? #64 as well for 0.3.0 and I also feel like it's asking a lot to have so much extra runtime support code just to write to stdout/stderr in language standard libraries. Overall I've always had the feeling that stdout/stderr management is causing a lot of headaches and integration questions when at the end of the day everyone typically wants something much simpler that's along the lines of "please just print this".

IIRC @pchickey and I basically concluded during 0.2.0 that we'd just come back to this at some point before 1.0.0 and rethink stdio. Personally I think now's a good as time as any as we transition from 0.2.0 to 0.3.0. There's of course downsides to the above such as "piping" is less obvious than before or redirection, but so far I'm not aware of any guest language which would support that. This could hypothetically be added in the future but I think it'd be best to start with a simple write and read function if we can.

@alexcrichton
Copy link
Contributor Author

One other thing I can mention: from my time working on Rust we at the time discovered that there's no way to portably write bytes across platforms to stdout/stderr. Fundamentally on Windows at the time it was UCS-2 which is not what most source languages expect (they just write bytes). In Rust that means that writing to stdout on Windows implicitly converts all bytes to &str (e.g. UTF-8) and then reencodes to UTF-16 and then does the write. Rust however still exposes the "just write bytes" primitive from the standard library.

For WASI we may want to consider changing stdout/stderr/stdin from being byte-based to being string-based, which would also necessitate a change from stream<u8>. I'm not certain this is the right thing to do though since so much of the world assumes stdio is all byte-based, so it doesn't seem unreasonable to have component runtimes on Windows handle the difference but no one else does.

@programmerjake
Copy link

note that nodejs has functions for async writes to stdout/stderr.

also, on unix it's common for commands to write binary to stdout, e.g. writing a raw gzip file to stdout:

jq < my/json-file.json | gzip > pretty.json.gz

@alexcrichton
Copy link
Contributor Author

I'd have to double-check, but my hunch is that the async read/write in Node.js is a "lie" in the sense that it goes out to a thread pool and it's blocking there. If things haven't changed from ~10 years ago (which I realize that's a little far-fetched) Node is built on libuv and Rust also used to use libuv. I remember libuv specifically treating stdio and not turning file descrptors nonblocking and farming reads/writes out to a thread pool.

Also I understand that commands are often piped to one another, that's done and works on Windows too! The problem is writing to the console on Windows. If things haven't changed from whence I last looked, there's no way to write binary data to a console on Windows, it's required to be a UCS-2 (aka a sequence of 16-bit integers, aka UTF-16 often reencoded from UTF-8). Note that this is different from processes piping output between each other, that's what enables piping binary data between processes.

@pchickey
Copy link
Contributor

pchickey commented Mar 11, 2025

I agree with the spirit of this change! Using streams for stdin/out/err is much more complicated than the reality, which is that programs need to print bytes to the terminal so humans understand what happened. Unix pipes should be subsumed by other modes of component composition: we no longer want to use stdin/out/err to communicate between components, just from components to humans.

For WASI we may want to consider changing stdout/stderr/stdin from being byte-based to being string-based, which would also necessitate a change from stream

This may end up being too radical, but I think its worth considering if there's some point in the design space that could accommodate this the concern that "stdin/out/err are just for communicating with humans" and also that "those humans may be using a tty" and so colors and other control codes are expected to work. WASI 0.2 does expose isatty (in a roundabout manner that never got fleshed out). Can we get an interface as trivial as fn write(data: string) -> result; in the case where the program knows it just wont be using tty functionality? I don't think wasi-libc could target that, but maybe others could.

@alexcrichton
Copy link
Contributor Author

I talked some more about this yesterday with @dicej, @yoshuawuyts, and @vados-cosmonic, and wanted to write down some thoughts from that. Overall they helped me clarify what I'm trying to do here, which is two-fold:

  1. Simplify "hello world" and small applications. In these situations the full power of streams is not necessary, both today with WASIp2 and tomorrow with WASIp3. It's kind of nice to be able to get started quickly with "just write the bytes" style of API.
  2. Change the defaults that languages expect to synchronous bindings, not asynchronous bindings.

The first point is relatively self-explanatory where a "simple" function is simpler than streams, but the second point is relatively nuanced. Namely with the component model async situation async is not part of a function's type, meaning there are not "colors" in component model async. Thus it's possible to call any function asynchronously just as well as calling it synchronously. This means that regardless of annotations in WIT it's actually possible to still use these functions in a synchronous or asynchronous manner.

Given that I don't actually think that there's any loss in functionality over what we have today with WASIp2. The main change is the default expectation of what these APIs are doing. For me this would be a signal that the default expectation should be the use of synchronous APIs for writing and asynchronous for reading. Scenarios which require async, though, are still possible! For example one particular library could virtualize stdio with asynchronous interfaces and then if a language imported everything as asynchronous then everything would cooperate and work well together.


For WASI we may want to consider changing stdout/stderr/stdin from being byte-based to being string-based, which would also necessitate a change from stream

This may end up being too radical ...

No I think you're right, I don't think it's worth going down a string-based route.

@lukewagner
Copy link
Member

The proposed changes make sense to me too for the upcoming 0.3.0 release.

Longer term (in some later 0.3.x release leading up to 1.0-rc), I was thinking we might want to do the following and, since Alex mentioned thinking about 1.0 above, I'd be curious what folks thought.

We could introduce a new wasi:cli/main interface:

interface main {
  main: async func(args: list<string>, stdin: stream<u8>) -> stream<u8, _, error-code>;
}

and it would be the linker's injected start-thunk's job to set up a standard (synchronous) C/C++ main() call, just like it does today, but replacing the calls to wasi:cli/{stdin,stdout} with the stdin parameter of main and, for stdout, calling stream.new (and immediately task.returning it).

The point of this change would be that any language could create Unix-style pipelines as easily as in bash. E.g., in JS, something like this could work:

import { main as f1 } from './f1.wasm';
import { main as f2 } from './f2.wasm';
import { main as f3 } from './f3.wasm';

let result = f1(args1, f2(args2, f3(args3, (await fetch('data.json').body)));

which I think would make wasi:cli components more-generally useful, usable and composable.

With wasi:cli/main, I think we could @deprecate (keeping around throughout 0.3.x) wasi:cli/run, wasi:cli/stdin and wasi:cli/stdout.

stderr is missing from the above because, for it, I agree with Alex's comment about it being a simple synchronous function. Separately, I think there are a number of good reasons (perf and DX) to add a canon console.log built-in that is omnipresent and not considered part of the official I/O of the component (such that console.log can be non-deterministically ignored, as is standard in production environments today, saving the runtime cost of the WIT-level copy in such cases), and this would @deprecate wasi:cli/stderr too.

WDYT?

@programmerjake
Copy link

programmerjake commented Mar 12, 2025

i was just discussing SIGPIPE issues on another project, so sounds good to me, assuming you can get good handling for what to do when the read end of stdout is closed. you'd want to be able to run cleanup (e.g. finishing database transactions), but exit the component asap, in particular aborting any infinite write-output loops.

@badeend
Copy link

badeend commented Mar 13, 2025

I have no stake in this game and either solution will probably work fine, but since you were openly solliciting feedback on Zulip, here it goes: 😄


I'm not sure I fully understand the need to change the interface. The CM is "colorblind" so regardless of the WIT syntax, it can be called synchronously and asynchronously, and implementations are going to need to deal with that anyway, right?

At the OS level stdout/stderr are not async (...) pretending output is async is a bit of a lie.

I see your point, but the same used to apply to File I/O, until io_uring came along. As matter of fact, the first (and only) example on https://man7.org/linux/man-pages/man7/io_uring.7.html is about asynchronously piping stdin to stdout.


I think part of the problem is that the usage of stdin/out/err is too overloaded.

  • For the common case its just a text-based output/UI mechanism, limited only by the rate the terminal window is able render characters on screen. (CPU-bound)
  • In other scenarios they're proper binary inputs/outputs and can be piped into anything, including files & sockets. (IO-bound)

I like Luke's idea of having a dedicated logging mechanism. I think that also plays into your point regarding:

Simplify "hello world" and small applications. In these situations the full power of streams is not necessary (...)

With that in place, I'd say:

  • Stdin/out: asynchronous binary stream<u8>
  • Logging: synchronous & string-based

And in some environments, the logs are redirected to stdout/err.


No language I'm aware of exposes this as an async interface or as a stream.

Aside from Node.JS, .NET's Console.In/Out/Error properties are TextWriters/Readers wrapping a stream and expose *Async versions of their operations. Example: await Console.Out.WriteAsync("Hello!");. Probably using a threadpool behind the scenes, though :)

I remember libuv specifically treating stdio and not turning file descrptors nonblocking and farming reads/writes out to a thread pool.

I haven't double checked with the libuv source code, but the current Node.JS docs mention:

Writes may be synchronous depending on what the stream is connected to and whether the system is Windows or POSIX:

  • Files: synchronous on Windows and POSIX
  • TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
  • Pipes (and sockets): synchronous on Windows, asynchronous on POSIX

Which seems to imply that writes may truly be asynchronous nowadays (depending on conditions).

@alexcrichton
Copy link
Contributor Author

Thanks that's all definitely quite valuable!

it can be called synchronously and asynchronously, and implementations are going to need to deal with that anyway, right?

You're right, but there's also, in my opinion, a lot less ceremony around "call the function with a list" vs working with streams. I'm still working through bindings generation in Rust for example but bindings for stream<u8> are going to be quite different than bindings for WASIp2 streams today for a variety of reasons. Personally given something as "conceptually simple" as stdio I'd prefer to do something simple by default and have the power-user use case of async stdio be more hidden away.

The color-blind-ness means that if both the producer and consumer are "power users" then everything works out, as well as if one isn't a power user it still works just not as efficiently as otherwise.

With that in place, I'd say:

  • Stdin/out: asynchronous binary stream
  • Logging: synchronous & string-based

And in some environments, the logs are redirected to stdout/err.

In my mind this sort of support is mostly related to guest languages where the logging builtins are always available but are conditionally not used in CLIs when the stream-based main signature is used instead. That would mean that the Rust-level std::io::Stdin for example would be the multiplexing point for either stream<u8> or call-the-builtin.

For today the "stdio is just free functions" idea is more-or-less the future stand-in for "logging builtins" where we'd one day remove the WASIp3 stdio interfaces and replace them with logging builtins. In that sense just-a-free-function better matches the possible future state of builtins.

Aside from Node.JS, .NET's Console.In/Out/Error properties are TextWriters/Readers wrapping a stream and expose *Async versions of their operations

Many I'm really showing my lack of expertise here... Also man that matrix of possibilities in libuv is quite something, I had no idea!

@rvolosatovs
Copy link
Contributor

rvolosatovs commented Mar 19, 2025

I'm in favor of simplifying the interface by moving away from streams.

For reference, this is what it looks like to write to STDERR in Rust using wasi:[email protected] APIs directly today: https://github.com/rvolosatovs/sqlx/blob/e545ffb655472a26b057e67c177396403e009255/examples/postgres/todos/src/lib.rs#L108-L110

write: func(data: list<u8>) -> result would certainly be much nicer to use.

That said, however, I think we still should mark all stdio as async, for the following reasons:

  • Implementing stdio as async in the host is fairly trivial even if using a thread pool. While it's likely not really asynchronous (although it could be, as @badeend pointed out), it is from the perspective of the guest - the guest can still do more work in-between the call to write and awaiting of the result.
fut := stdout.write(b"hello")
call_some_func() # guest should be able to still call this function, since the host can run this on a separate thread
fut.await
  • Guests that don't care about async stdio can still generate the bindings as "sync"
  • Guests that do care about async stdio can directly use it. For example, in Rust, tokio::io::Stdout and tokio::io::Stderr could be truly asynchronous on wasm32-wasip3 target by using these APIs directly. If these APIs were not marked as async in WIT, then third-party libraries (like tokio) would not be able to depend of them being async by default

@alexcrichton
Copy link
Contributor Author

Personally, in Wasmtime at least, I want to make stdio blocking-by-default and not async. In my experience stdio is a critical debugging utility and adding infrastructure/complications to that risks masking bugs or making things more difficult. For example if Rust were to implement async stdio but C were to implement sync stdio, it might be possible to see prints in reverse order depending exactly how the code looks. This is added on top of the fact that printf and println! are both synchronous, not asynchronous.

To clarify though guests/hosts can, as you point out, still do whatever they want. I expect tokio to import async versions of stdio, and guests using sync can use sync regardless of how the API is tagged. In that sense this truly is a question of purely defaults, and that's where I would personally prefer to default on the side of simplicity -- the API doesn't have async and the host implementation probably doesn't even start off as async but instead it's just a blocking write.

In terms of simplicity and hello world, what we're working with is:

(module
  ;; sync import of `write: func(bytes: list<u8>) -> result;`
  (import "cm32p2|wasi:cli/[email protected]" "write" (func (param i32 i32) (result i32))) ;; standard mangling
  (import "wasi:cli/[email protected]" "write" (func (param i32 i32) (result i32))) ;; non-standard mangling

  ;; sync import of `write: async func(bytes: list<u8>) -> result;`
  (import "cm32p2|wasi:cli/[email protected]" "[async]write" (func (param i32 i32) (result i32))) ;; standard (?) mangling
  (import "wasi:cli/[email protected]" "[async]write" (func (param i32 i32) (result i32))) ;; non-standard mangling
)

I realize it's a little ridiculous to optimize for hello world, but in some sense I also see this as important. It's surprising that you import an [async] name and get sync behavior. This detail is normally hidden from folks so they don't have to worry about it, but hello-world writers will run into this.

@rvolosatovs
Copy link
Contributor

For example if Rust were to implement async stdio but C were to implement sync stdio, it might be possible to see prints in reverse order depending exactly how the code looks. This is added on top of the fact that printf and println! are both synchronous, not asynchronous.

I'd expect for Rust standard library, most standard libraries, in fact, to assume sync stdio. async stdio would be for "power-users". async being a hint for Wasm runtimes, rather than guests.

I expect tokio to import async versions of stdio

In case that stdio is blocking by default (i.e. async I/O is non-standard), it seems more likely that Tokio would not support stdio on wasm32-wasip3, as it does not now, instead, likely, wait for thread support to land and use a thread pool.

I'm trying to point out that async stdio default in runtimes would provide the most compatibility, whereas sync stdio default in runtimes would limit it and prevent guest libraries from depending on it, thus limiting the potential set of "lift and shift" scenarios "componentizing" existing applications. It appears that the right way to signal the async-by-default stdio is adding the async keyword in function definitions.

Alternatively, since both hosts and guests are free to do whatever they want anyway, how about we document the fact that stdio is expected to be implemented async, and so guest code can depend on the fact that stdio is async, if it wants to, without adding async keyword in the WIT, so signature stays write: func(bytes: list<u8>) -> result;?

@alexcrichton
Copy link
Contributor Author

I agree that async stdio in Wasmtime would be the most flexible, but it's also the most complicated to implement. My hunch is that 90%+ of users of stdio use sync anyway, so all the work done to make it async would be generally lost anyway. Today it's all sync anyway, so this would be fresh work for wasmtime-wasi. The async nature means that if two components are composed together and they happen to both use async it'll all work out, but for the host Wasmtime will be synchronous. The one possible thing I can think of is that Wasmtime multiplexes "the native host stdio" and "embedder-supplied stdio" where the latter could be async-by-default and the former is sync-by-default.

Effectively what I want to reach for is the opposite of what you're thinking, I want to document these functions as sync and recommend everyone uses the sync versions. The async versions are always there for scenarios like Tokio, and the fact that the Rust guest thinks it's doing async stdio with Tokio when it's actually blocking is something we can either configure in Wasmtime at some point or do something else about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants