Skip to content

wasm2c: partial support for atomic memory ops #2233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 18, 2023

Conversation

shravanrn
Copy link
Collaborator

@shravanrn shravanrn commented May 10, 2023

This PR starts the work for support for the threads proposal. In this initial PR, I am only starting with support for the Atomic memory operations part of the spec https://github.com/WebAssembly/threads/blob/cc01bf0d17ba3fb1dc59fb7c5c725838aff18b50/proposals/threads/Overview.md?plain=1#L228

Since there is a large amount of support needed, it seemed sensible to land this in chunks.

Broadly, there are three types of operations

  1. atomics (load, store, add, sub, cmp_xchg)
  2. fence
  3. conditional variables (wait, notify)

This PR focuses on implementing operations 1 and 2 and relies on atomic builtins provided by gcc, clang and icc https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
and the MSVC equivalents
https://learn.microsoft.com/en-us/cpp/intrinsics/intrinsics-available-on-all-architectures?view=msvc-170

On godbolt, it appears that this has support from

on x86_64

  • starting gcc 4.7
  • starting clang 3.1
  • at least icc 13.0.1

on arm/arm64

  • at least gcc 5.4
  • at least clang 9

Unfortunately, we can't use C11 operations as casting to/from atomics from/to a regular pointer in the Wasm heap is undefined behavior in C https://stackoverflow.com/questions/55299525/casting-pointers-to-atomic-pointers-and-atomic-sizes --- this means we can't use these ops on pointers to Wasm's linear memory

Next steps

  • Operation 3 needs further thinking on the best way to implement this, so we can continue to support different compilers and OSes. Currently the best way forward is to implement this, seems to be to implement this on top of atomic ops, similar to [this](There's https://developers.redhat.com/articles/2022/12/06/implementing-c20-atomic-waiting-libstdc#how_can_we_implement_atomic_waiting_). Thoughts welcome.

  • This code is tested with a subset of the thread atomic tests. These tests are included in-tree, with unsupported tests (relying on wait/notify) commented out. We can start testing with the full testsuite, once wait/notify is also supported.

  • It would be nice to find more efficient ways to perform atomic alignment checks instead of an explicit conditional

@shravanrn shravanrn requested review from keithw and sbc100 May 10, 2023 22:13
Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I assume there re some tests in the threading proposal repo that we can start running perhaps?

@shravanrn
Copy link
Collaborator Author

shravanrn commented May 12, 2023

(Main comment edited to reflect the latest changes)

@sbc100 Unfortunately, the tests in the proposal repo have all tests in a single wast including for wait/notify which is not implemented.

https://github.com/WebAssembly/testsuite/blob/7ef86ddeed81458f9031a49a40b3a3f99c1c6a8a/proposals/threads/atomic.wast#L427

So, to run a subset of tests, I have made a local copy of the wast with the unsupported tests removed. Let me know if this is reasonable

I have also added the missing alignment checks from the last PR

@sbc100 @keithw I am not thrilled with this approach of performing alignment checks, as I think it will incur non-trivial overhead. Open to any ideas.

@shravanrn shravanrn force-pushed the atomicops branch 2 times, most recently from efc4af2 to 20906eb Compare May 12, 2023 06:55
@shravanrn shravanrn changed the title wasm2c: partial support for atomic ops in non msvc compilers wasm2c: partial support for atomic memory ops May 12, 2023
@shravanrn
Copy link
Collaborator Author

PR now updates to support msvc also

Copy link
Member

@keithw keithw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments.

Bigger picture question: do you think it's going to be feasible to support the proposal in full? There was an earlier conversation at #1766 that seems pretty pessimistic. :-( I haven't thought carefully about it though or looked at the C spec.

@shravanrn
Copy link
Collaborator Author

shravanrn commented May 18, 2023

lgtm % comments.

Bigger picture question: do you think it's going to be feasible to support the proposal in full? There was an earlier conversation at #1766 that seems pretty pessimistic. :-( I haven't thought carefully about it though or looked at the C spec.

@keithw yup, I think the quick summary is that #1766 is a more elaborate version of the statement "lowering wasm atomics into C atomics will cause undefined behavior" that I noted earlier. I believe the current compiler intrinsic approach is safe from UB.

In particular, since we are using compiler intrinsics, we aren't violating C's aliasing rules such as pointers of different types should be non-overlapping. Here is the full train of thought.

  • Overall, wasm2c has reads/writes using memcpy (i.e., it does not assume anything about the type of the source/dest apart from it being a byte buffer, and assumes nothing about overlaps)
  • wasm2c now has atomic reads/writes using compiler intrinsics which once again assumes nothing about the address, except possibly platform specific alignment restrictions (which are typically weaker or the same as Wasm's requirements anyway).
  • Any alias or overlap of address thus should not cause an issue, as the C compiler cannot assume anything about non-intersection or non-overlap between non-atomic and atomic accesses (which is the source of UB described in wasm2c supporting threads and other features #1766)

So overall, I believe we are fine here. However, we can continue to monitor UBSan, to make sure things remain good. I don't think msvc currently supports UBsan, but once that is enabled, it would be good to get that on CI as well.

Just for completion, I'll note that the only thing that was a bit weird to handle in wasm2c was MSVC's lack of atomic_load and atomic_store. Instead, they note in their documentation, that register width loads are always meant to be atomic. I am not sure if that documentation was written with primarily x86 in mind though (and arm architectures etc. would be a problem), nor do I know what happens if we read a 64-bit value in 32-bit machine. Thus, I have gone with conservative approach here of using atomic_or and atomic_exchange primitives to implement this as mentioned here, as this should definitely be safe (although we could potentially lose some performance)

@shravanrn shravanrn enabled auto-merge (rebase) May 18, 2023 02:49
@shravanrn shravanrn merged commit 795f415 into WebAssembly:main May 18, 2023
@keithw keithw mentioned this pull request Oct 24, 2023
MEMCHECK(mem, addr, t1); \
ATOMIC_ALIGNMENT_CHECK(addr, t1); \
t1 result; \
wasm_rt_memcpy(&result, &mem->data[addr], sizeof(t1)); \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for having this additional memcpy before the atomic load?

Copy link
Collaborator Author

@shravanrn shravanrn Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, moving data from a source and destination of different types in C (apart from the basic supported things like int to long conversions) should be done by memcpy. This is because casting the data byte buffer to multiple target types can trigger UB. Put simply, if you have both expressions u32* a = (u32*) &mem->data[addr] and u64* b = (u64*) &mem->data[addr] in the same C file output by wasm2c, then the C compiler which compiles this file is allowed to optimize assuming a & b don't alias. However, Wasm makes no such guarantees about aliasing, meaning a & b could alias/overlap. This end result here is UB in the C compilation.

t1 result; \
wasm_rt_memcpy(&result, &mem->data[addr], sizeof(t1)); \
result = atomic_load_##t1(&mem->data[addr]); \
wasm_asm("" ::"r"(result)); \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for having this forced read? Is it always necessary? What happens if it is not performed?

Copy link
Member

@keithw keithw Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasm2c forces liveness of every load result (not just for atomic ops) because WebAssembly requires a deterministic trap on OOB load/store. We don't want the compiler to optimize the read out (e.g. if the value is never observed) and prevent the SIGSEGV/SIGBUS from getting triggered. (The spec tests will fail if you take these forced livenesses out and run wasm2c's output through an optimizing C compiler.) There's probably more efficient ways we could be doing this, but this is what we've got to avoid a software bounds-check on every load.

See #1925 and #1939

@turbolent
Copy link

@shravanrn Great work! I'm currently implementing similar support in w2c2 and was wondering about some of the code

@shravanrn
Copy link
Collaborator Author

@shravanrn Great work! I'm currently implementing similar support in w2c2 and was wondering about some of the code

@turbolent Also, have a look at the currently open PRs. I am changing this to move away from compiler intrinsics to C11 features.

@squk
Copy link
Contributor

squk commented May 13, 2024

@shravanrn I'm curious if you have the following expression types working in wasm2c(AtomicWait, AtomicNotify)?

wabt/src/c-writer.cc

Lines 3973 to 3976 in c24a216

case ExprType::AtomicWait:
case ExprType::AtomicNotify:
case ExprType::CallRef:
UNIMPLEMENTED("...");

They are currently unimplemented in the main branch and I was hoping a fork of yours has these implemented so they can be merged upstream? I'm attempting to use threads wasm2c(https://github.com/google/leveldb/blob/068d5ee1a3ac40dabd00d211d5013af44be55bea/util/env_posix.cc#L817-L832).

@shravanrn
Copy link
Collaborator Author

@squk You can try the patch from this PR #2268
Note that the wasm2c might have changed a bit since then, so you may have to rebase this to get it to work. Fwiw, note that this PR is not likely to be exactly what we land in wasm2c production (per the comments in that PR), but I did test that this PR works correctly with threads at the time.

@squk
Copy link
Contributor

squk commented Jul 8, 2024

@shravanrn Thanks. I also found these branches
main...PLSysSec:wabt:atomicops_part2_refactor
main...PLSysSec:wabt:atomic_sharedmem3

I'm curious, are the atomic memory operations something that's being actively worked on?

@shravanrn
Copy link
Collaborator Author

shravanrn commented Jul 8, 2024

@shravanrn Thanks. I also found these branches
main...PLSysSec:wabt:atomicops_part2_refactor
main...PLSysSec:wabt:atomic_sharedmem3

I'm curious, are the atomic memory operations something that's being actively worked on?

Given my current availability, I think I'm unlikely to get to this in the next couple of months, especially because some of our plans to use wasm threads got dropped. This may change in the future, in which case, I'll try to prioritize implementation again. For the moment, you're welcome to give the implementation a shot if you have cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants