Skip to content

Conversation

@grynspan
Copy link
Contributor

@grynspan grynspan commented Nov 11, 2025

This PR fixes the race condition in Test.cancel() that could occur if an unstructured task, created from within a test's task, called Test.cancel() at just the right moment. The order of events for the race is:

  • Unstructured task is created and inherits task-locals including the reference to the test's unsafe current task;
  • Test's task starts tearing down;
  • Unstructured task calls takeUnsafeCurrentTask() and gets a reference to the unsafe current task;
  • Test's task finishes tearing down;
  • Unstructured task calls UnsafeCurrentTask.cancel().

The fix is to use trylock semantics when cancelling the unsafe current task. If the test's task is still alive, the task is cancelled while the lock is held, which will block the test's task from being torn down as it has a lock-guarded call to clear the unsafe current task reference. If the test's task is no longer alive, the reference is already nil by the time the unstructured task acquires the lock and it bails early. If we recursively call cancel() (which can happen via the concurrency-level cancellation handler), the trylock means we won't acquire the lock a second time, so we won't end up deadlocking or aborting (which is what prevents calling cancel() while holding the lock in the current implementation).

It is possible for cancel() to trigger user code, especially if the user has set up a cancellation handler, but there is no code path that can then lead to a deadlock because the only user-accessible calls that might touch this lock use trylock.

I hope some part of that made sense.

Checklist:

  • Code and documentation should follow the style of the Style Guide.
  • If public symbols are renamed or modified, DocC references should be updated.

…el()`.

This PR fixes the race condition in `Test.cancel()` that could occur if an
unstructured task, created from within a test's task, called `Test.cancel()` at
just the right moment. The order of events for the race is:

- Unstructured task is created and inherits task-locals including the reference
  to the test's unsafe current task;
- Test's task starts tearing down;
- Unstructured task calls `takeUnsafeCurrentTask()` and gets a reference to the
  unsafe current task;
- Test's task finishes tearing down;
- Unstructured task calls `UnsafeCurrentTask.cancel()`.

The fix is to use `trylock` semantics when cancelling the unsafe current task.
If the test's task is still alive, the task is cancelled while the lock is held,
which will block the test's task from being torn down as it has a lock-guarded
call to clear the unsafe current task reference. If the test's task is no longer
alive, the reference is already `nil` by the time the unstructured task acquires
the lock and it bails early. If we recursively call `cancel()` (which can happen
via the concurrency-level cancellation handler), the `trylock` means we won't
acquire the lock a second time, so we won't end up deadlocking or aborting
(which is what prevents calling `cancel()` while holding the lock in the current
implementation).

I hope some part of that made sense.
@grynspan grynspan added this to the Swift 6.3.0 milestone Nov 11, 2025
@grynspan grynspan self-assigned this Nov 11, 2025
@grynspan grynspan added the bug 🪲 Something isn't working label Nov 11, 2025
@grynspan grynspan added the concurrency 🔀 Swift concurrency/sendability issues label Nov 11, 2025
@grynspan
Copy link
Contributor Author

Sigh. The Linux implementation treats EDEADLK as fatal, when it should just return false.

@grynspan
Copy link
Contributor Author

See swiftlang/swift#85448

Copy link

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable enough, thanks for looking into the safety of this approach.

@grynspan
Copy link
Contributor Author

@grynspan grynspan merged commit fd350e4 into main Nov 12, 2025
52 of 54 checks passed
@grynspan grynspan deleted the jgrynspan/test-cancel-race branch November 12, 2025 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug 🪲 Something isn't working concurrency 🔀 Swift concurrency/sendability issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants