Use `trylock` to eliminate the remaining race condition in `Test.cancel()`. #1415

grynspan · 2025-11-11T20:54:15Z

This PR fixes the race condition in Test.cancel() that could occur if an unstructured task, created from within a test's task, called Test.cancel() at just the right moment. The order of events for the race is:

Unstructured task is created and inherits task-locals including the reference to the test's unsafe current task;
Test's task starts tearing down;
Unstructured task calls takeUnsafeCurrentTask() and gets a reference to the unsafe current task;
Test's task finishes tearing down;
Unstructured task calls UnsafeCurrentTask.cancel().

The fix is to use trylock semantics when cancelling the unsafe current task. If the test's task is still alive, the task is cancelled while the lock is held, which will block the test's task from being torn down as it has a lock-guarded call to clear the unsafe current task reference. If the test's task is no longer alive, the reference is already nil by the time the unstructured task acquires the lock and it bails early. If we recursively call cancel() (which can happen via the concurrency-level cancellation handler), the trylock means we won't acquire the lock a second time, so we won't end up deadlocking or aborting (which is what prevents calling cancel() while holding the lock in the current implementation).

It is possible for cancel() to trigger user code, especially if the user has set up a cancellation handler, but there is no code path that can then lead to a deadlock because the only user-accessible calls that might touch this lock use trylock.

I hope some part of that made sense.

Checklist:

Code and documentation should follow the style of the Style Guide.
If public symbols are renamed or modified, DocC references should be updated.

…el()`. This PR fixes the race condition in `Test.cancel()` that could occur if an unstructured task, created from within a test's task, called `Test.cancel()` at just the right moment. The order of events for the race is: - Unstructured task is created and inherits task-locals including the reference to the test's unsafe current task; - Test's task starts tearing down; - Unstructured task calls `takeUnsafeCurrentTask()` and gets a reference to the unsafe current task; - Test's task finishes tearing down; - Unstructured task calls `UnsafeCurrentTask.cancel()`. The fix is to use `trylock` semantics when cancelling the unsafe current task. If the test's task is still alive, the task is cancelled while the lock is held, which will block the test's task from being torn down as it has a lock-guarded call to clear the unsafe current task reference. If the test's task is no longer alive, the reference is already `nil` by the time the unstructured task acquires the lock and it bails early. If we recursively call `cancel()` (which can happen via the concurrency-level cancellation handler), the `trylock` means we won't acquire the lock a second time, so we won't end up deadlocking or aborting (which is what prevents calling `cancel()` while holding the lock in the current implementation). I hope some part of that made sense.

grynspan · 2025-11-11T21:01:59Z

Sigh. The Linux implementation treats EDEADLK as fatal, when it should just return false.

grynspan · 2025-11-11T21:20:11Z

See swiftlang/swift#85448

…GNU_SOURCE" This reverts commit 9d6c9fb.

… jgrynspan/test-cancel-race

ktoso

Looks reasonable enough, thanks for looking into the safety of this approach.

grynspan · 2025-11-12T15:04:27Z

Related: https://forums.swift.org/t/should-we-document-the-behavior-of-mutex-withlockifavailable/83166

grynspan added this to the Swift 6.3.0 milestone Nov 11, 2025

grynspan self-assigned this Nov 11, 2025

grynspan requested a review from stmontgomery as a code owner November 11, 2025 20:54

grynspan added the bug 🪲 Something isn't working label Nov 11, 2025

grynspan requested review from briancroom and jerryjrchen as code owners November 11, 2025 20:54

grynspan added the concurrency 🔀 Swift concurrency/sendability issues label Nov 11, 2025

Avoid abort on Linux when the current thread already owns the lock

ee56a0f

grynspan added 9 commits November 11, 2025 16:24

Use pthread_t instead of gettid() since the latter requires _GNU_SOURCE

9d6c9fb

Revert "Use pthread_t instead of gettid() since the latter requires _…

21255c0

…GNU_SOURCE" This reverts commit 9d6c9fb.

Just declare gettid()

e76d0ee

Returns

1b32337

Another return

8d69ae0

Just use pthread_mutex_t on Linux until the Swift issue is cleared up

26fb71d

Merge remote-tracking branch 'origin/jgrynspan/test-cancel-race' into…

8371a2f

… jgrynspan/test-cancel-race

Remove gettid() declaration, don't need it anymore

a8427f9

pthread_mutex_destroy

c319c99

briancroom approved these changes Nov 11, 2025

View reviewed changes

Make sure we still cancel the current task in the edge case

f194f20

ktoso approved these changes Nov 12, 2025

View reviewed changes

Extra return

e7a040e

stmontgomery approved these changes Nov 12, 2025

View reviewed changes

grynspan merged commit fd350e4 into main Nov 12, 2025
52 of 54 checks passed

grynspan deleted the jgrynspan/test-cancel-race branch November 12, 2025 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use `trylock` to eliminate the remaining race condition in `Test.cancel()`. #1415

Use `trylock` to eliminate the remaining race condition in `Test.cancel()`. #1415

Uh oh!

grynspan commented Nov 11, 2025 •

edited

Loading

Uh oh!

grynspan commented Nov 11, 2025

Uh oh!

grynspan commented Nov 11, 2025

Uh oh!

ktoso left a comment

Uh oh!

grynspan commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Use trylock to eliminate the remaining race condition in Test.cancel(). #1415

Use trylock to eliminate the remaining race condition in Test.cancel(). #1415

Uh oh!

Conversation

grynspan commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist:

Uh oh!

grynspan commented Nov 11, 2025

Uh oh!

grynspan commented Nov 11, 2025

Uh oh!

ktoso left a comment

Choose a reason for hiding this comment

Uh oh!

grynspan commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Use `trylock` to eliminate the remaining race condition in `Test.cancel()`. #1415

Use `trylock` to eliminate the remaining race condition in `Test.cancel()`. #1415

grynspan commented Nov 11, 2025 •

edited

Loading