-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Experience using wasm-bindgen from C #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thing I'm not sure about: how is the custom section generated from Rust in the first place? My not-having-looked at it understanding is that we annotate the functions with an attribute to tell the LLVM backend to stash info in a custom section, but my understanding of the LLVM backend is that it doesn't do anything sophisticated here and just passes that data along, meaning the Rust macros are generating the JSON. I have no idea how we'd do that from C/C++ without the backend itself (or some other offline tool) handling that. The general difficulty on the frontend is that wasm_bindgen uses macros and traits to generate interface code, which are more powerful constructs than C/C++ has. EDIT: yeah there it is wasm-bindgen/crates/backend/src/codegen.rs Lines 93 to 98 in 9bab720
|
Hello and this sounds exciting! I'd love to help out where I can with C/C++ support! In general the output is very tailored for Rust's procedural macros today in the sense that it's the first one and it's all I've worked with so far. I'm totally open to overhaul anything necessary, even if it means adding parallel codepaths in the codegen parts of the wasm-bindgen cli tool. In that sense it may make sense to attempt to shoehorn C/C++ to Rust in the same way but long-term I suspect we'll either want to evolve the current strategy to accomodate C/C++ better (and change Rust at the same time) or just have two binary formats. For your more specific questions...
That's awesome! The internals of wasm-bindgen are definitely not the best documented right now :(
Totally. Right now there's a rustc-originating restriction that every function to a normal Rust function corresponds to precisely one argument in the shim that the wasm-bindgen macro generates. This isn't practical for all types, though. Types like strings need two arguments, a pointer and a length, in Rust. The Now Rust strings are obviously quite different than C/C++ strings. I think we'd probably want to add a new type to bindgen, CString, or something like that. (in the descriptor functions). A C string would expect just a pointer and would figure out the length at runtime in JS. C/C++ usage of wasm-bindgen would then use CString predominately while Rust's usage would be renamed to something like RustString. Does that make sense?
Totally. This is one of the trickest parts for me with wasm-bindgen currently. I personally have a good idea of what I want the I'd imagine that C/C++'s support would probably be pretty different. I don't have a preconcieved notion, though, for how C/C++ might work, so it may be best perhaps to work backwards from the desired end state (annotation-wise or whatnot) for how this might be implemented in C/C++. Does that make sense? This sort of goes with the above "C/C++ probably won't look at all like Rust" or just in general how we'll need to change the format to accomodate C/C++.
Currently the ABI of exported wasm function exclusively consists of u32/f32/f64. While it looks like you're writing bare structs and such under the hood it's all translating to u32/f32/f64. Basically the
Definitely! And no need to worry about solidification, I think I'm on like rewrite number 5 for wasm-bindgen in terms of internals, so it still changes a lot as-is :)
Ah yes as I believe you've found it's "procedural macros" in Rust which basically means "arbitrary code running at compile time over a list of tokens". I don't believe C/C++ has such capabilities so we'll definitely need to accommodate a sort of more "append friendly" format for something like C/C++ (which Rust can of course switch to as well) |
I know! I'm not going to be focusing deeply on this for another month or two though, this was more of an exploratory prescreen to get a better sense of the lay of the land, so hold on to the excitement for a bit. But now is an excellent time to hash things out on Github, so:
I feel we'd want to go the other direction here, and have just Strings that are a (pointer, length) pair, then Rust can use (str.ptr, str.len), C For injecting the WASM_BINDGEN
void console_log(const char* str); and translating that to extern void __wbg_console_log_impl(int len, const char* str);
void console_log(const char* str) {
__wbg_console_log_impl(strlen(str), str);
} So, generate an intermediate function that proxies out to the imported function with any lowering, annotation, or conversion applied. In theory this should get inlined everywhere. Not sure how much magic we'll need on the C side vs. how much this is doable with a macro directly.
I believe that to be the next step. What do ergonomic bindings look like from C/C++, and what can we do to that to get that targeting wasm-bindgen?
Talking through this more locally, this is possibly a near-ideal end-state for wasm-bindgen to wind up in. Consider calling from a C module into a Rust module, how should that look? The sanest (easiest at least) ABI is to pass everything via opaque handles and expose methods that live in the native module to do any manipulation. That way disparate languages don't have to agree on an ABI other than targeting wasm-bindgen, and they should work seamlessly with each other.
I did notice that __wbindgen_describe was added a few days before I started trying this yeah.
Yep. Probably won't be able to generate JSON from C macros unless they're very regular. |
I'd also be down for that. I've got work to do on the Rust side to figure out how to get both arguments into the function signature, but that is certainly emminently doable.
The reason for the one-to-one mapping with arguments has to do with sort of deep Rust issues like variadic generics and whatnot. I actually originally had wasm-bindgen doing exactly what you're thinking, passing both pointer/length via function arguments. Once I started to add support for closures, though, the special-casing for strings quickly became untenable. That being said I'm pretty sure we can get around this. Right now with the Rust ABI whenever structs are passed by value they're passed via their components so that's a way to fix the issue. In fact, I think I'll go do that as soon as I'm done writing this comment.
An excellent question to be asking! This is one I've thought about with wasm-bindgen as well over time. The conclusion that I've reached is that long-term we'll probably want to keep wasm-bindgen, even as features like host bindings make their way into the was spec. The basic rationale for this is that wasm-bindgen ends up doing quite a bit more than what host bindings and other native wasm features will enable. The
Those are just some examples, there's likely more! My main point here is that purely relying on native wasm features (aka host bindings, Now I of course like to think that wasm-bindgen would work great for you, but if it falls short please let me know! I'd be quite excited to bring on more helpers/maintainers or otherwise just work towards more integration with more languages.
I'd imagine very possible! The main purpose of the custom section is to be an index of all imported/exported functionality, but other than that "master list" everything could be shoved into also cc @lukewagner, you're likely to be pretty interested in this discussion as well! (C/C++ interop and wasm-bindgen) |
Bah this is actually more difficult than I thought. It makes total sense to pass a string as two values but returning a string is more interesting. The ABI there is currently "pass a pointer as the first argument and I'll fill it in", but that's a little unfortunate and not easy for JS to call as well. Perhaps not impossible though! I may just delay that precise change for a bit :) |
Ah, "we" in that context meant C/C++, as in, is it easier/more beneficial for C to target wasm-bindgen, or to target host bindings directly and polyfill appropriately in the interim.
My understanding of the host bindings proposal is that that conversion would be provided by the JS engine at the wasm boundary https://github.com/WebAssembly/host-bindings/blob/master/proposals/host-bindings/Overview.md#import-argument-binding-types . The current design also has a pointer+length pair, as two arguments.
Not today, but I believe in the fullness of time that should be possible. Nitpicking aside, wasm-bindgen does have all that and more today, and lets us experiment more freely by decoupling the C/Rust language interface from whatever the browser can understand.
All the function descriptions are exported already, and exports must have external names. So we could iterate over all exports whose externally-visible name starts with a "_wbg_describe" prefix, and that should handle the discoverability aspect.
Interesting, from the host-bindings proposal on returning a string:
So, maybe that? Somehow?
Talked to @lukewagner briefly about this this morning actually. Also @sunfish who is also probably interested. |
Oh sorry, and by "we" I meant the WebAssembly working group in Rust :) FWIW wasm-bindgen definitely started as a polyfill for upcoming features (like es modules and a... variant (at this point?) of host bindings), and I think it's just grown more functionality over time like exporting classes/structs, importing classes/structs with Rust-like interfaces, closures, etc. Plus there's nice TypeScript and semi-idiomatic JS bindings to call into the Rust code to boot!
Definitely yeah, sees quite viable!
Hm perhaps yeah! That's sort of what happens today anyway when JS passes a string to Rust (JS has to call the Rust allocator to put is somewhere). Sounds like I in general should read the host bindings in more detail! |
Could also use some kind of global red zone for return values (although that puts a max size on return values, since a frame can't just allocate enough room for arbitrary returns like it can with return via pointer). |
@fitzgen oh for sure yeah, the problem though is that I don't think that we've really got a well defined set of semantics for the wasm ABI. Basically what I'm thinking of doing is generating Rust functions that look like: #[repr(C)]
pub struct Slice {
pub ptr: u32,
pub len: u32,
}
#[no_mangle]
pub fn __wbg_export(a: Slice) {
// ...
} And in theory with a "well defined ABI" we'd concretely know that
Or in other words the was function takes two parameters, the ptr/len pair. That way we have a guarantee that it's lowered "correctly" and JS also knows how to call it (pass two arguments). I think that this is probably simple enough to rely on, I'm not actually sure how we'd change that ABI even with more wasm ABI things to be stable in the future. If we were to return it, however -- #[no_mangle]
pub fn __wbg_export2() -> Slice {
// ...
} then today we generate That may mean that if we were to implement this today it'd break down the road when we get new wasm features. Although maybe that's ok? We may still be a long way off from a feature like this and in the meantime it may be good to avoid using the global argument stack for these extra payloads. |
I had some discussion with @lukewagner on IRC today a bit about the __wbindgen_describe functions. The current rationale for __wbindgen_describe functions is purely a limitation of how procedural macros in Rust work. Procedural macros only look at token streams (aka syntax) and don't have information like name resolution available to them. In the long term I think we very much want to keep everything in custom sections, but also long term this support is likely to move into LLVM itself if it ends up being more standardized. In the case that wasm-bindgen descriptor information moves into LLVM (or, for example, LLVM is generating the custom section) then rustc would probably be the one generating the information rather than a procedural macro. In that situation it's very easy to switch everything back to custom sections rather than having these descriptor functions. With that in mind the descriptor functions are sort of a polyfill to what the future custom section may look like. I'm totally down to structure the In that sense I wouldn't necessarily rely on on __wbindgen_describe functions sticking around. I think they're basically just a holdover to a better solution! |
Great discussion and lots of great points!
So first: agreed with passing these by scalar components being desirable. For return, I think we should design under the assumption that multi-return will be coming soonish and then we just come up with a hack between then and now. E.g., a fixed set of globals, used as "return registers" (I assume there is a small upper bound here given a fixed vocabulary of lowered data types).
Assuming you're talking about statically linking C and Rust code together, iiuc, the problem to be solved here is making sure the .o files produced by separately-compiled Rust and C are correctly statically linked by
I view
This seems like a brilliant workaround for the limitations of a pure-macro approach without compiler support, but I'd suggest not doubling down on it in C and instead, for the toolchain convention going with a declarative custom section and using a pre-pass as a Rust toolchain detail that evaluates these |
I think that only works with exportable mutable globals? In the meantime using a global argument stack like what's there today is probably fine as a hack.
I meant dynamically linking them together. Picture a world where we have wasm modules as ES6 modules, if I have C code that imports a LLD should be able to link together any two .o files that agree on an ABI, but here we can build wasm modules that agree to target wasm-bindgen, and that's possibly much easier to target from a variety of languages. Just thinking out loud and being overly optimistic with this line of thought though.
Thinking about this more, the descriptor functions are probably the wrong thing yeah. Really what I'm asking for is if we simplify the custom section format to be more writeable from less-smart frontends. In my ideal world, the LLVM backend doesn't need to know that wasm-bindgen exists at all, purely from a separation-of-concerns standpoint. Ideally in C we can use macros to generate a payload to an |
Oops right, I forgot about that.
Ah, yes indeed it is an explicit goal that, with wasm-bindgen, you don't really know the implementation language of the other side of your imports/exports. Going one small step further, if modules' exported interfaces are specified with a .d.ts or .webidl file, and then the importing module's wasm-bindgen declarations can be generated.
Right, and LLVM probably doesn't even have enough source-language type info. Similar to how it sounds like Rust needs rustc integration to avoid the |
@jgravelle-google FWIW, an alternative to a clang plugin would be to take advantage of the fact that With that, the user-written incantation: WASM_BINDGEN_DECLARE_FUNC(print, void, std::string); could be macro-expanded to: __attribute__((wasm_bindgen (wbg::describe<void (std::string)>::bytes)))
void print(std::string); |
That's roughly the ergonomics I had in mind, minor tweaky nitpicks would be to do something like:
which takes a parenthesized third argument to serve as the list of arguments, and swaps the function name with return type, to make it look a little more like standard C function declarations. I can wrap my head around how to handle this with variadic C++ constexpr templates, but I'm totally lost how we'd do this with standard C. For a first pass we can start with requiring This also makes me wonder where this
That is extremely useful to know, thank you :) |
Your/EM_JS's version looks great too and starting with C++ (14, I think, b/c relaxed constexpr) seems fine.
While the format is evolving, that makes sense to me at least. Maybe there could even be a basic C++ unit test to ensure that |
FWIW I'd be totally down for housing a header file here and making sure it stays up to date with the Rust side of things (to the best of my ability) |
I've opened #184 to have strings passed as two arguments instead of the pointer as an argument and the length through the global stack. |
So after yesterday's CG meeting, I decided to see how far I could push C++ SFINAE towards implementing wasm-bindgen for C++, and it turns out you can get fairly far: https://gist.github.com/fitzgen/3dee8e4f8cd3c91ddc700d7a47424782 Some analysis of what is missing from this proof of concept is at the bottom of the file. |
I'm going to close this since it's been quiet for quite some time now, and I think that the story nowadays is likely that https://github.com/webassembly/interface-types is the answer to "what wasm-bindgen would look like if it were standardized". |
Hello,
I did some experiments recently to see how well wasm-bindgen works for C code. I'm curious about host bindings support for C/C++, and was wondering about whether it makes sense to target wasm-bindgen for handling the polyfilling/bundling/etc part. Answer seems to be... maybe?
My approach
Results
Technically this worked. I only tried passing strings and exporting structs so far. Some issues:
&str
s is nonobvious. I had to modify the C callsite to store strlen in__wbindgen_global_argument_ptr
. Digging deeper this looks to be handled by the Into/FromWasmAbi traits, which I'm not sure how we'll implement in C. C++ can probably do some shenanigans with templates or class conversions here.__wbindgen_global_argument_ptr
is just kinda magic as far as I can tell.As is the name of the custom section, I'm well aware that the wasm-bindgen ABI is unstable :) . But I figure it's useful to bring this up now before the designs solidify with too many Rust-specific assumptions in place.
The text was updated successfully, but these errors were encountered: