Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory order inconsistency on freelist block reusage #404

Open
ayles opened this issue Dec 17, 2024 · 9 comments
Open

Potential memory order inconsistency on freelist block reusage #404

ayles opened this issue Dec 17, 2024 · 9 comments

Comments

@ayles
Copy link

ayles commented Dec 17, 2024

Hi! So I noticed that TSAN complains about races inside concurrentqueue. I was able to reproduce it with the following minimal example:

#include "concurrentqueue.h"

#include <thread>
#include <vector>

int main() {
    moodycamel::ConcurrentQueue<int> q;

    std::vector<std::thread> threads;

    for (size_t i = 0; i < 5; ++i) {
        threads.emplace_back([&q]() {
            while (true) {
                int item;
                q.try_dequeue(item);
            }
        });
    }

    for (size_t i = 0; i < 10; ++i) {
        threads.emplace_back([&q]() {
            while (true) {
                q.try_enqueue(1);
            }
        });
    }

    for (auto& t : threads) {
        t.join();
    }
}
...

WARNING: ThreadSanitizer: data race (pid=1709853)
  Write of size 4 at 0x728c00000718 by thread T14:
    #0 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::ImplicitProducer::enqueue<(moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::AllocationMode)1, int>(int&&) ... concurrentqueue.h:2544:4
    #1 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::inner_enqueue<(moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::AllocationMode)1, int>(int&&) ... concurrentqueue.h:1376:94
    #2 moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::try_enqueue(int&&) ... concurrentqueue.h:1074:15
    #3 main::$_1::operator()() const ... main.cpp:23:19
    
...

  Previous read of size 4 at 0x728c00000718 by thread T5:
    #0 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::ImplicitProducer::dequeue<int>(int&) ... concurrentqueue.h:2596:17
    #1 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::ProducerBase::dequeue<int>(int&) ... concurrentqueue.h:1720:50
    #2 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::try_dequeue<int>(int&) ... concurrentqueue.h:1146:32
    #3 main::$_0::operator()() const ... main.cpp:15:19

...

I believe this race (in terms of C++ memory model) happens like this:
T0: Thread A reads from memory slot S.
T1: Thread A releases memory slot S of block X.
T2: Thread B releases last slot of block X, pushes block into the freelist.
T3: Thread C acquires block X from the freelist, writes to slot S.

I guess release in this and similar places should be acq_rel (or release + acquire fence on the last increment):
https://github.com/cameron314/concurrentqueue/blob/master/concurrentqueue.h#L1607
to build happens-before relation between the slot release (thread A) and the whole block release (thread B) and subsequent push to the freelist.
(Tested it, TSAN stopped complaining).

I am not so good with C++ memory model, so maybe I am missing something here?

XiaoXuan42 added a commit to XiaoXuan42/concurrentqueue that referenced this issue Jan 4, 2025
hs-vc added a commit to hs-vc/concurrentqueue that referenced this issue Jan 22, 2025
@hs-vc
Copy link

hs-vc commented Jan 22, 2025

@cameron314 This issue looks quite serious. To address it, I've submitted #406, which is based on this commit. When you have a moment, could you please review the changes? Thank you for your time and consideration.

@cameron314
Copy link
Owner

I can confirm this TSan report indicates it thinks the block was not handed off cleanly in the FreeList between the two threads. (Line 1607 is irrelevant, as it's in code that's never executed with the default block size.)

It is not clear to me whether or not this is a false positive. More analysis is needed. At the root of the free list implementation are CAS operations on freeListHead, which is already done with release semantics when a block is put on the freelist and acquire semantics when a block is taken off. Therefore my first impression is that this is likely a false positive.

If there is a bug, blindly changing memory ordering until TSan stops complaining does not necessarily indicate the bug is fixed. Actual human logic needs to be involved.

@cameron314
Copy link
Owner

Think I found one case where the freeListRefs may get temporarily manipulated by another thread than the one in add_knowing_refcount_is_zero, leading to a break in happens-before ordering with respect to that member (note that this should not affect the happens-before of putting a block on the list head and getting it back). Pushed e74b07d.

@ayles
Copy link
Author

ayles commented Jan 27, 2025

Sorry, but I do not understand how line 1607 is irrelevant if it is executed (on my machine, with the example above).

And, yes, if thead B releases the block (after releasing the last element) and thread C acquires that block, there is definitely a clear happens-before relation between them.

However, when thread A releases a non-last element from that block, it does not interact with freeListRefs/freeListHead in any way. And the only place where threads A and B can establish happens-before relation is when they increment elementsCompletelyDequeued.

In fact, I do not believe in false positives with modern TSAN, unless they are extremly rare or thread fences are in use (ofc when talking about C++ memory model and not about actual races on a particular architecture).

@ayles
Copy link
Author

ayles commented Jan 27, 2025

Forgot to mention that, as expected, e74b07d does not fix the problem.

@cameron314
Copy link
Owner

Ah, I misread your initial analysis. I see what you mean now. It's not about the thread freeing the block but about the other dequeue operations on that block before that. Thank you, this is helpful.

It seems I also reread the code too quickly. Line 1607 is skipped with the default block size for explicit producers, but not implicit ones.

Yes, the commit I pushed is for an unrelated potential issue.

cameron314 added a commit that referenced this issue Jan 27, 2025
… that block's reuse when empty, affecting implicit producers (see #404)
@cameron314
Copy link
Owner

I don't have TSan set up at the moment, please let me know if the latest commit fixes this particular warning.

Based on #406, I'm guessing there are others that are unrelated to this one.

@ayles
Copy link
Author

ayles commented Jan 27, 2025

I can't test this commit at the moment, but as I mentioned earlier, one change on line 1607 completely resolves the warnings in the example.

I suppose there might be a similar issue with flags in the if-branch as well.

Maybe it will be possible to introduce TSAN-tests in the future, starting with the example above?

@ayles
Copy link
Author

ayles commented Jan 27, 2025

Also, in terms of performance, it may be better to use release memory order with acquire-fence on the last increment, as described here https://www.boost.org/doc/libs/1_53_0/doc/html/atomic/usage_examples.html#boost_atomic.usage_examples.example_reference_counters.discussion
With a call to __tsan_acquire in TSAN-builds, ofc.
(But the new conditional branch may cost a little, so it will require some benchmarking)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants