Skip to content

Change Wasm's cdylib etc. to be a "reactor". #108097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sunfishcode
Copy link
Member

Use --entry _initialize on cdylib and similar library types on Wasm targets.

This is a change in how constructor functions have at least sometimes been handled. It will mean that there's an exported _initialize function which calls the static constructors, and it must be called from the outside before any other export is accessed. Rust doesn't officially have static constructors, but C++ does and C has not-uncommonly-used extensions that do, and there are crates in Rust that provide static-constructor functionality.

Some Wasm embeddings automatically call the _initialize function if present because it is part of the WASI Preview1 application ABI, though not all do.

This does not implement support for dynamic linking. The format and behavior of cdylib and similar outputs may change in future compiler versions, especially as Wasm gains a stable dynamic linking format.

What this does do, is make cdylib work more like what many people have been assuming cylib does on Wasm. It produces a Wasm module that doesn't have a main function that you can instantiate and call exports on, but which now also supports static constructors by having an _initialize function that you or your engine must call first.

Use `--entry` `_initialize` on `cdylib` and similar library types on Wasm
targets.

This is a change in how constructor functions have at least sometimes been
handled. It will mean that there's an exported `_initialize` function which
calls the static constructors, and it must be called from the outside before
any other export is accessed. Rust doesn't officially have static
constructors, but C++ does and C has not-uncommonly-used extensions that do,
and there are crates in Rust that provide static-constructor functionality.

Some Wasm embeddings automatically call the `_initialize` function if
present because it is part of the [WASI Preview1 application ABI], though
not all do.

This does not implement support for dynamic linking. The format and behavior
of `cdylib` and similar outputs may change in future compiler versions,
especially as Wasm gains a stable dynamic linking format.

What this does do, is make `cdylib` work more like what many people have
been assuming `cylib` does on Wasm. It produces a Wasm module that
doesn't have a main function that you can instantiate and call exports
on, but which now also supports static constructors by having an `_initialize`
function that you or your engine must call first.

[WASI Preview1 application ABI]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md
@rustbot
Copy link
Collaborator

rustbot commented Feb 15, 2023

r? @compiler-errors

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 15, 2023
@rustbot
Copy link
Collaborator

rustbot commented Feb 15, 2023

These commits modify compiler targets.
(See the Target Tier Policy.)

@sunfishcode
Copy link
Member Author

I have mixed feelings about this. On one hand, always having an _initialize function means we can always handle static constructors in a consistent way. But, this means that everyone needs an _initialize function, whether they have static constructors or not, and ideally many people won't have any static constructors.

An alternative is this patch to wasm-ld which auto-wraps all reactor exports with code to check whether static constructors have been called and automatically call them the first time any export is called. That would obviate the _initialize function. On the other hand, it would mean that every export call in a program with static constructors would have this runtime check wrapped around it. But on the other other hand, maybe that's an extra incentive to write programs without static constructors.

But on the other other other hand, the Wasm component model may add a second initialization phase, which would be specifically for supporting constructors, which suggests that we should stick with an _initialize function for now, because once we have all the pieces here we can arrange for the component model to call it at the right time.

@workingjubilee
Copy link
Member

In the past we have said that Rust does not have "life before main", and that while those crates certainly exist, they essentially rely on using features of the platform beyond our control, or using purely compile-time trickery. I am concerned that this could be seen as "welp, life-before-main exists when we remove all the factors we can control for in the language, so add life-before-main to the language." That seems like a badly-reasoned precedent.

While we could shrug and say "platform!" again, this seems to be a much more deliberate choice we are making about how we handle the code we emit. I am disinclined to write it off, especially because the mere presence of wasm32-unknown-unknown as a target has had severe impacts on our ability to reason about our floating point semantics because of choices by the WebAssembly working groups that do not align with the IEEE754-2008 rules.

I am also concerned at the "zero-cost abstraction" level: making every Rust program pay for so-called "static" constructors because they hypothetically exist does not seem like a great status quo.

@workingjubilee
Copy link
Member

Mind, I am aware that the current wasm/wasi model requires something to happen here as a matter of the "process" ABI. I am just concerned this might be the wrong thing.

People are trying to depend in significant ways on the current wasm/wasi semantics, even when those are deliberately underspecified. To some extent, this is being added to satisfy those programmers who want to depend on something. With this much of the semantics up in the air, I don't feel like we can rely on such a second initialization phase being formally added. It seems at significant risk of letting people down a year out from now if people decide certain things were bad ideas.

@sunfishcode
Copy link
Member Author

choices by the WebAssembly working groups that do not align with the IEEE754-2008 rules

I know this isn't the topic of the PR itself, but: I've tried to follow the relevant threads about this but have not previously seen this conclusion. Do you have links to more information?

@workingjubilee workingjubilee added the O-wasm Target: WASM (WebAssembly), http://webassembly.org/ label Feb 20, 2023
@workingjubilee
Copy link
Member

workingjubilee commented Feb 24, 2023

There's been a bunch of discussion around the wasm targets and how these probably should be considered even more of a "sneak preview" than they are, and how they will likely be changed. A lot. Soon. I'm wondering if this PR handles #108381? If so then I think, while my concerns remain, they should probably be tabled, as they will likely be addressed in the future and/or we will have more room to discuss alternatives very soon.

@compiler-errors
Copy link
Member

r? compiler

@rustbot rustbot assigned cjgillot and unassigned compiler-errors Mar 1, 2023
@cjgillot
Copy link
Contributor

cjgillot commented Mar 7, 2023

I have no idea what this PR is about.
r? compiler

@rustbot rustbot assigned wesleywiser and unassigned cjgillot Mar 7, 2023
@pnkfelix
Copy link
Member

pnkfelix commented Apr 6, 2023

@sunfishcode I'd like to help see this get to some resolution, but my time is very limited this month. I'm not sure if we're going to get someone from T-compiler who can dedicate time to this in the nearer term.

It seems clear that there are issues here that are up for debate. Does this deserve a design meeting? or, if not that, then a dedicated zulip stream to discuss the matter? (Maybe there already is such a zulip stream?)

@sunfishcode
Copy link
Member Author

My interpretation of #108097 (comment) and perhaps also the reluctance of anyone to comment on this is that Rust doesn't want to be the one telling uses that they need to make sure to call _initialize.

Consequently, I've started WebAssembly/tool-conventions#203 to seek to establish this convention at a WebAssembly level, and how now posted a CG meeting agenda for next week's meeting in WebAssembly/meetings#1253.

I don't know if the _initialize convention will reach consensus, but I'm hoping that if it doesn't, I'll hopefully at least get suggestions on alternative approaches to pursue.

@apiraino
Copy link
Contributor

I'll label this as S-blocked to try to signal that it depends on work outside of T-compiler (iiuc the above comment)

@rustbot label +S-blocked -S-waiting-on-review

@rustbot rustbot added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 20, 2023
@bors
Copy link
Collaborator

bors commented Jul 1, 2024

☔ The latest upstream changes (presumably #127216) made this pull request unmergeable. Please resolve the merge conflicts.

@Dylan-DPC Dylan-DPC added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-blocked Status: Blocked on something else such as an RFC or other implementation work. labels Jan 27, 2025
@Dylan-DPC
Copy link
Member

@sunfishcode any updates on this? I assume it's not blocked since the pr it was blocked on is now merged, but not sure there's more work this is blocked on or not. (if it's not blocked you can resolve the conflicts). Thanks

@sunfishcode
Copy link
Member Author

There were no objections in the CG meeting, and yes, the PR WebAssembly/tool-conventions#203 is now merged, so BasicModuleABI.md is now a tool convention saying that if a module exports "_initialize" and not "_start", then "_initialize" is to be called before other exports. And this convention is now supported by at least some tools, such as wasm-tools component.

The main open questions here are: First, is something like this still needed in practice? And second, would it break people's existing setups, and if so what should we do about it?

@alex-semenyuk
Copy link
Member

@sunfishcode
Perhaps we can tag who can help to address this questions

Also do you want to make changes now? if no perhaps better to change to blocked again

@apiraino
Copy link
Contributor

cc @alexcrichton @burakemir @hoodmane @juntyr

maybe one of the WASM ping-group can chime in. Thanks!

@juntyr
Copy link
Contributor

juntyr commented Apr 28, 2025

I have one crate that enables a wasi-specific logger to plug logging inside WASM together with logging on the host. For initializing the logger, I faced the problem of WASM not supporting ctors. While I could have exported an _initialize function myself, I would have risked clashing with e.g. linked C++ code which might contain a compiler-generated _initialize. If Rust itself supported ctors, I would expect them to just-work inside WASM. But since it doesn't, I was ok with manually wrapping all of my exports to initialize the logger before it is first needed.

My main qualm in this situation was not with Rust not having ctors, but with log not allowing me to specify the default logger as a static (something that rust-lang/rfcs#3635 or rust-lang/rfcs#3645 would solve). So even in my ideal world, I wouldn't use a ctor in Rust but define a static that, if needed, runs initialization code when its trait implementation is first called and then replaces the boot-logger with a newly initialized one.

TLDR: ctors are not part of Rust (and I'm ok with that). Therefore, I don't expect hacked-together ctors to work inside WASM. Thus, I don't really want to pay the cost of having an _initialize function if I don't need one.

@bjorn3
Copy link
Member

bjorn3 commented Apr 28, 2025

wasm-ld supports .init_array.

@alexcrichton
Copy link
Member

I realize that this PR is two years old, and I also realize that what I'm going to be asking below is a big ask especially relative to the size of this PR. In that sense I'm mostly curious to explore an extension to this PR which may or may not need to happen before this is landing. In any case...

One of the main consequences I think from this is is that _initialize is going to be exported, by default, for modules within Rust-based components for the wasm32-wasip2 target, for example. As @sunfishcode pointed out the tooling has support for this and the component will, indeed, invoke _initialize before any other export is called. That part I'm not worried about, but what I am worried about is that there's no optimization to turn this off.

To me the majority of components/code won't be using this feature and won't need _initialize, meaning that the function is effectively dead code. To compare before/after this PR, this Rust source:

#[unsafe(no_mangle)]
pub extern "C" fn foo() {}

currently generates this module on wasm32-wasip1 with the cdylib crate type

(module $foo.wasm
  ;; ...
  (export "foo" (func $foo.command_export))
  (func $foo (;0;) (type 0)
    return
  )
  (func $dummy (;1;) (type 0))
  (func $__wasm_call_dtors (;2;) (type 0)
    call $dummy
    call $dummy
  )
  (func $foo.command_export (;3;) (type 0)
    call $foo
    call $__wasm_call_dtors
  )
  ;; ...
)

This is broken if __wasm_call_dtors actually does something because each invocation would run destructors which is probably going to result in surprising behavior. I believe this is something that @sunfishcode wants to fix, and I agree this should be fixed! Of note though is that this module has no other exports, it's just the foo function.

With this PR the generated wasm looks like:

(module $foo.wasm
  ;; ...
  (export "foo" (func $foo))
  (export "_initialize" (func $_initialize))
  (func $__wasm_call_ctors (;0;) (type 0))
  (func $foo (;1;) (type 0)
    return
  )
  (func $_initialize (;2;) (type 0)
    block ;; label = @1
      global.get $GOT.data.internal.__memory_base
      i32.const 1048576
      i32.add
      i32.load
      i32.eqz
      br_if 0 (;@1;)
      unreachable
    end
    global.get $GOT.data.internal.__memory_base
    i32.const 1048576
    i32.add
    i32.const 1
    i32.store
    call $__wasm_call_ctors
  )
  ;; ...
)

Here, as expected, _initialize is exported as well. This has some extra code to figure out it doesn't actually need to do anything, but that's not the end of the world.

What I'm specifically worried about is the component-level cost involved for supporting _initialize. The process of creating a component means that if _initialize is present a new core module must be instantiated to actually run _initialize. This is more costly relative to today where no extra core module is needed.


With that as background, my thinking is that we have before/after states of:

  • Before this change ctors/dtors are most definitely broken, but they're (I think) rarely used. Also before this change a rust-based component does not need the extra instance in a component to run _initialize as it doesn't exist.
  • After this change dtors would work (they wouldn't be run) and static ctors also work (they actually get run). All rust-based components, however, would generate an extra instance in components to run _initialize which ends up doing nothing.

To me this feels kind of unfortunate and is something where I'd prefer to, for example, land something in wasm-ld and/or wasi-libc which skips _initialize altogether if there aren't actually any static constructors. That would mean that we could preserve the majority-status-quo while also adding support for static ctors at the same time. What I'm mostly afraid of otherwise is that we're disrupting the status-quo to add support for a feature which isn't otherwise widely used yet. Not to say of course it's not useful to support, nor to downplay that having it not work can be very surprising, but I'm afraid of the larger impact this will have on folks who aren't even aware of static ctors/dtors and aren't using them.

@sunfishcode do you know if it would be possible to implement such an optimization to conditionally export _initialize? I know wasm-ld has knowledge that __wasm_call_ctors is a noop, but I'm not sure how the _initialize function, defined in wasi-libc, could be conditionally exported depending on whether __wasm_call_ctors is a noop or not. I suspect that would require more wasm-ld integration than currently exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-wasm Target: WASM (WebAssembly), http://webassembly.org/ S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.