[BUGFIX] Fix thpool_destroy() hang: unlock all workers at once #135
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, i'm nil0x42, the creator of duplicut
I wrote this patch for duplicut's copy of thpool.c, as i had a strange timeout in some unit tests that arised only once per ~1000 runs.
unblock all workers in thpool_destroy(), fix hang with medium-to-large pools
thpool_destroy()
is supposed to wake every worker, wait until the poolis empty, and return quickly.
In practice it can block for dozens of seconds as
soon as the pool size out-grows its one-second “fast-exit” window.
Root cause
bsem_post_all()
setsv = 1
and broadcasts.bsem_wait() → while (v==0) … ; v = 0; …
→ the single ticket is consumed, all other awakened threads fall
straight back into
pthread_cond_wait()
.one
bsem_post_all()
+sleep(1)
per loop → one ticket persecond.
With 4000 threads the destructor needs ~4000 s; even with only 16
threads (real-world
duplicut
run) it can exceed a watchdog such astimeout 5
, hanging about 1 ‰ of the executions.Fix (pure ANSI C / POSIX, no API change)
bsem_wait()
now consumes exactly one ticket:bsem_post_all()
grants “infinite” tickets so every waiter canpass:
The single-post path (
bsem_post()
) is unchanged: one post still wakesone thread.
All accesses to
v
remain protected by the semaphore mutex, so no newdata races or lock-order inversions are introduced.
Reproducer added
tests/thpool_destroy_hang.sh
tests/src/thpool_destroy_hang.c
duplicut
.Impact
thpool_destroy()
– 4000 threadsduplicut
(16 threads)The change makes pool shutdown reliable regardless of the thread count
and keeps behaviour identical in every other respect.