- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 744
Add std.atomic support #10864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add std.atomic support #10864
Conversation
| Thanks for your pull request and interest in making D better, @rymrg!  We are looking forward to reviewing it, and you should be hearing from a maintainer soon. 
 Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + phobos#10864" | 
| This LGTM but I'd like another pair of eyes looking at this. Anyone you can think of? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enforce Allman style
grep -nrE '(if|for|foreach|foreach_reverse|while|unittest|switch|else|version) .*{$' $(find etc std -name '*.d'); test $? -eq 1
std/atomic.d:13:struct Atomic(T) if (__traits(isIntegral, T) || isPointer!T) {
std/atomic.d:76:	T opUnary(string op)() shared if (op == `++`) {
std/atomic.d:80:	T opUnary(string op)() shared if (op == `--`) {
std/atomic.d:84:	auto ref opUnary(string op)() shared if (op == `*`) {
std/atomic.d:123:@safe unittest {
std/atomic.d:146:@safe unittest {
std/atomic.d:152:@system unittest {
std/atomic.d:163:		while (arr[1].load!(MemoryOrder.acq) != 1) {
std/atomic.d:176:@safe unittest {
std/atomic.d:183:@safe unittest {
std/atomic.d:191:@safe unittest {
std/atomic.d:199:unittest {
braces on newlines
|  | 
15fe140    to
    4c573d6      
    Compare
  
    | Not quite sure what it wants with respect to documentation and unittests. I guess you need   | 
| Thanks. I hope I didn't miss anything. | 
| Two more  | 
| I'm usually sceptical of tools like this. I can't see anything obviously wrong with the implementation, but in my experience, this sort of thing tends to encourage usage by non-experts in ways that invite broken code. For my money, I don't see anything wrong with a call to  Anyway, I can't see anything obviously wrong here (aside from my comments around shared and the questionable tests), and there are other things like this in existence; I just probably wouldn't. The use of  | 
| // static array of shared atomics | ||
| @safe unittest | ||
| { | ||
| static shared(Atomic!int)[5] arr; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An array of 5 atomic int's! This is a perfect example of a disaster waiting to happen.
Interactions between more than one atomic always require an exceptionally high amount of care. I would be concerned that this benign looking line essentially communicates to an expert that you're not qualified to be messing with this sort of thing ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why. An implementation of a bounded chase-lev queue will do exactly this. The atomic is there to give you easy to work with primitives to write safe concurrent data structures. If you want the unbounded version then yes, you'll need the array pointer to also be atomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An implementation of a bounded chase-lev queue will do exactly this.
Are you sure? I would expect to need exactly 2 indices, for the top and the bottom.
If there are multiple queues; then I would expect them to be individually encapsulated and that would be the array-ed object.
That aside, of course I could certainly manifest situations where this isn't strictly invalid; but a cluster of atomics in an array doesn't give a strong implication that they are strictly independent. If the elements together are taken to represent some sort of coupled logical state, this is almost certainly a disaster waiting to happen. This feels like a code-smell at best. Most people in my experience fail to handle multiple atomic moving parts, at least in the event it becomes more complicated than a pair of queue cursors.
I mean, it's just not clear where this line is going. It's rare to see more than 2 cursor's at a time, unless this were an array of cursors (length == num_threads). My feeling is that as sample code, it presents a dangerous idea.
I'm nit-picking here because my key concern with this whole thing is that it presents to a user the impression that atomics are like, no big deal.
In more complicated scenarios where you have multiple work-stealing queues (one for each worker thread) or something like that, then I would expect the atomic details to be enclosed in the queue objects. It would be best to structurally prevent handling them in conjunction, it's too difficult to reason about in practise.
And again, that principle generally just leads me to the position that a call to cas, load/acq, store/rel, inside a tool (like a queue implementation) is just not a big deal, and somewhat more direct and readable in practise.
Convenience is just not a goal where atomics are concerned from my perspective; absolute maximum clarity is the only goal I would recognise. It's almost always only one or one pair of lines; a whole tool to hide one or 2 lines of code which you can follow in a direct and linear way just doesn't feel like it carries it's weight to me.
Do you strongly feel a tool like this here has value? Be honest with yourself; what lines of code are you trying to make disappear behind this tool? How many such lines exist in your software? Chances are the number of lines is countable on your fingers... and if it's more than that, I would get nervous.
If you feel like you can make a strong case for its value, I'd like the unit tests (ie; samples) to present realistic patterns, for the sake of not misleading readers.
|  | ||
| void reltest() @safe | ||
| { | ||
| arr[0].store!(MemoryOrder.rel)(1); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release is unnecessary here. Also, there's no reason for arr[0] to be atomic at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really depends on what this example signifies. If you're thinking of this tiny benchmark, then yes. If you're thinking of more complicated things then no. This is the canonical example for MP-idiom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, I'm not sure what the example signifies; all I see is a bad example... this example as written is showing wrong code; 2 consecutive releases, and 2 consecutive acquires are not a correct implementation of this pattern shown (or anything like this, I couldn't describe this as canonical?). In this example, only arr[1] should be atomic.
Showing arr[0] as a second atomic could only confuse readers; they'll probably assume the example was written by an expert and try and make sense of it... whatever conclusion they manifest to explain the code they see will be wrong, and they may then go off and write bad code with their misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never considered unit-tests to be examples of how the code should be used. Only as a utility to test that the code does work.
For the record, if arr[0] was not atomic and the underlying implementation of the store to arr[1] was downgraded to relaxed instead of release, the unittest becomes undefined behavior instead of just failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a unittest block has a ddoc comment on it, that becomes part of the documentation as an example. It attaches to the previous non-unittest symbol.
///
unittest {
}Without the ddoc comment, it's only for verifying the behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never considered unit-tests to be examples of how the code should be used. Only as a utility to test that the code does work. For the record, if
arr[0]was not atomic and the underlying implementation of the store toarr[1]was downgraded to relaxed instead of release, the unittest becomes undefined behavior instead of just failing.
arr[1] is very clearly acq/rel, it's not relaxed... I have no idea what "downgraded" means; I think you mean "changing the code arbitrarily and introducing a bug"; anyone can introduce any bug anywhere at any time by arbitrarily changing code in a way that creates a bug.
The point of lockless/atomic code is to be efficient; you don't arbitrarily perform excessive cache synchronisations, that defeats the whole purpose. It's not 'defensive', it's just wrong.
Without the ddoc comment, it's only for verifying the behaviour.
People can still read unittests and nobody expects a unittest to be wrong.
| void acqtest() @safe | ||
| { | ||
| while (arr[1].load!(MemoryOrder.acq) != 1) { } | ||
| assert(arr[0].load!(MemoryOrder.acq) == 1); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second acquire is unnecessary.
Frankly, these kinds of mistakes are demonstrating why I wouldn't introduce a tool like this. Atomics are exclusively for experts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we optimize, might as well make the first load relaxed as well and put a fence acquire after the loop. But interestingly enough it will change behaviors on hardware (not only because fence acquire, acquires from everything, but because I could not reproduce Store-Buffer on some ARM when using acq/rel instead of rlx/rlx).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's likely to be an optimisation (depends on the probability of contention), and if it is, there would be a better optimisation...
I would leave acquire in the loop, assume the loop is likely to succeed, and save the bytes of program code from the extra operation.
If probability of contention here is not close to zero, you would use a well-formed spinlock instead with a backoff strategy instead of hot spinning waiting for the value to change; let the hyper-threads have more time, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you could have both :P
|  | ||
| @safe unittest | ||
| { | ||
| shared Atomic!(shared(int)) a = 5; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of shared here is pointless; but shared in D is completely broken, and we really need to enable the preview that's been sitting there for 5-6 years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I allocate a global without shared / __gshared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use static for TLS. However I do question if it couldn't be on the stack.
| private T val; | ||
|  | ||
| /// Constructor | ||
| this(T init) shared | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove shared, the whole point of this type is to wrap that detail out of the public API.
| import core.atomic : atomicLoad, atomicStore, atomicExchange, atomicFetchAdd, | ||
| atomicFetchSub, atomicCas = cas, atomicCasWeak = casWeak, atomicOp; | ||
|  | ||
| private T val; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the shared thing, you should put shared here.
| val.atomicStore(init); | ||
| } | ||
|  | ||
| private shared(T)* ptr() shared | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should not be shared, but it does correctly return a shared(*).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I make sure Atomic!((Atomic!int)*) works?
|  | ||
| Returns: The stored value | ||
| */ | ||
| T load(MemoryOrder mo = MemoryOrder.seq)() shared | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More shared; get rid of all of these; these methods are not shared.
| * See_Also: | ||
| * $(HTTP en.cppreference.com/w/cpp/atomic/memory_order) | ||
| */ | ||
| enum MemoryOrder | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this? I don't see a reason to repeat the type.
Why not just alias druntime's type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To name relaxed access correctly for the struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh man, don't get me started!
Just rename it in druntime, and wear the people that complain that you broke their code... maybe add a synonym to the druntime enum, and deprecate the stupid name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// ...
rlx,
raw=rlx,It is a simple change over in druntime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing gods work :P
|  | ||
| @safe unittest | ||
| { | ||
| shared Atomic!(shared(int)*) ptr = new shared(int); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptr is not shared; it is a stack local which is thread-local. The T = shared(int)* is also needlessly shared, because shared is transitive and val should be shared internally so it's redundant here.
| t2.start; | ||
| t1.start; | ||
| t2.join; | ||
| t1.join; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't possibly test anything like this with one sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I also cannot test it on x86 at all. I think ARM requires to add alignment to the variables as well (and even then, we need to test it in million of times).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to add alignment:
align(16) struct Atomic {
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alignment needs to be for the variable instance, we don't want to align all atomics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no, you don't want it on the field, it has to go on the struct.
On the field it only effects the layout of the Atomic struct, whereas you want the struct placed in other layouts like classes and structs with that alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I use the wrong terminology. This was needed on ARM for the SB example (not this MP example).
align (1024) __gshared int x;
align (1024) __gshared int y;| @ 
 I tried to get the atomic example to fit. If it doesn't work I'm not sure what the syntax should be. | 
| Thank you for your review! I was hoping to get one. 
 From my point of view, the current core.atomic is error prone for experts and pure error for non-experts thinking they can make use of it. For example, with  
 I don't see a problem calling  By defaulting to sequentially consistent, we nudge non-experts to a memory model they can reason about (even if it might be tricky for them). It reduces the amount of undefined behavior as well as unexpected behaviors they'll have in their code. Additionally, since combining two different libraries making use of atomics can introduce new bugs as it can introduce different synchronization paths than when the library executes alone, it is safer to default to SC as combining two concurrent libraries won't have problems of unexpected synchronization. Regretfully I don't have the citations in an easily search-able place so I won't write any numbers, but there are cases when SC access ends up being faster than non SC one. Others question if the programmer overhead required for WMM is worth the benefit of code being faster. 
 I wasn't sure about  Thanks again! | 
| 
 Can you show an example of this? 
 I mean just call  
 
 
 Yeah, no; you lost me here instantly. I've fixed a lot of broken lock-free code; and it's the most excruciating thing you can ever be tasked to do. 
 I don't agree at all; it probably only makes it harder to find their bugs because they become slightly less probable. Again; you don't encourage an amateur author to write lock-free code with a hope and a prayer that they can do this right... they almost certainly can't. 
 I don't follow. I'm not actually sure what you're trying to say here. I'm not aware of lock-free tools that are properly wrapped and isolated interacting with eachother in any dangerous ways. Are there examples of this you can show? It's news to me. 
 Yeah sorry, I don't know here. I never heard this, and I'm sceptical of this claim. 
 You should compile with  shared int x;
x = 10; // ERROR: can't write to shared memory (this is what you're worried about right?)
int y = x; // ERROR: can't read from shared memory
x.atomicStore!(MemoryOrder.relaxed)(10); // valid, atomicStore accepts a shared ref | 
| I'd like someone else to make a call on whether they agree or disagree that a tool like this should exist. I also want tests to be valid and sensible/useful things to present to readers in their own right. It's easy to mislead or mis-educate users with bad sample code. | 
| Overall I'm not happy with either solution. Yes atomics are best represented with a storage class & type qualifier and warrants one, but  If you can't have a dedicated storage class & type qualifier, the next best thing is a struct such as this. As for  | 
| 
 I don't know where  If this is introduced; maybe it should be added to  | 
| 
 Agreed, moving it to core.atomic would be a better place for it. | 
| 
 @atilaneves, I cannot remember, was it your idea or someone else that this goes into std? | 
| 
 Write a synchronisation tool and use it. No need for a language keyword, and it's almost always inferior to something application specific. | 
| 
 I'm not talking about the mutex, we need to be able to model the act of synchronization the lock+unlock. Can't have a lock being held with a yield that takes place. So there has to be language support and for us right now that is synchronized statements. | 
| 
 Okay, there's context here that I'm missing. I don't really care about that right now though, so please don't enlighten me :P | 
| 
 I know the feeling but I still think sometimes SC is the correct approach (just don't ever mix SC and non SC access). Similarly, when teaching you start with SC then maybe in more advanced courses go for weak memory. 
 Please ignore it, I was conflating different correctness criterion. 
 I don't understand either. I did reproduce it for some code but did not have the time to fully investigate it. | 
Add std.atomic struct as a better interface to core.atomic.
@atilaneves Please CR and let me know what's missing here.