Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout parameter to wait(::Condition) #56974

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

kpamnany
Copy link
Contributor

@kpamnany kpamnany commented Jan 6, 2025

We have a need for this capability. I believe this closes #36217.

The implementation is straightforward and there are a couple of tests.

Comment on lines +160 to +162
# Confirm that the waiting task is still in the wait queue and remove it. If
# the task is not in the wait queue, it must have been notified already so we
# don't do anything here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to introduce a data race though, so we cannot merge this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's that? We're locking the condition variable here.

Copy link
Member

@vtjnash vtjnash Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Timer runs concurrently with the return from wait, so by the time this code runs, you might have just corrupted some arbitrary subsequent wait on the same condition or by the time you schedule the TimeoutError, it could blow up some completely unrelated wait

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. There's an ABA problem. Let me see if I can find a solution for that.

But the waiting task is only scheduled with a TimeoutError if it was in this condition's wait queue, so I'm not sure I understand your "or" case here -- the only subsequent wait that could get blown up is a wait on the same condition, which is the same ABA problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could been in the waitq, then removed before you got around to scheduling it, or vice versa with some other thread scheduling before it got around to removing it from the queue. Those codes are running on other threads, so it could be concurrent. There is potentially no guarantee that you can safely mutate this data-structure concurrently on two threads (#55542)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a fix for the ABA problem that relies on happens-before -- if the waiter was scheduled, it sets waiter_left before returning. It can only re-enter the condition's wait queue by another call to wait, for which it must acquire the lock.

We acquire the condition's lock before checking waiter_left and for the task's presence in the wait queue. If the task is present, it can only be because it has not been scheduled, because if it was scheduled, it would have set waiter_left before re-entering the wait queue.

I think the combination of the lock and the atomic assure there is no ABA problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could been in the waitq, then removed before you got around to scheduling it

We acquire the lock, confirm that the waiter did not leave and remove it from the wait queue before scheduling it. If it was not in the wait queue, we do not schedule it and this decision is made while holding the lock.

some other thread scheduling before it got around to removing it from the queue

If the task is scheduled by notify, then it is removed from the condition's wait queue before it is scheduled, which is done while holding the condition's lock. If it is not in the wait queue, then we do not schedule it.

@nsajko nsajko added the multithreading Base.Threads and related functionality label Jan 7, 2025
Comment on lines +169 to +170
# send the waiting task a timeout
dosched && schedule(ct, TimeoutError(timeout); error=true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# send the waiting task a timeout
dosched && schedule(ct, TimeoutError(timeout); error=true)
# send the waiting task a timeout.
# note: the waiting task is guaranteed to not be scheduled, since it
# we removed it from the queue while we had locked the condition.
dosched && schedule(ct, TimeoutError(timeout); error=true)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: Oh, wait, i don't think this is true. See my next comment, coming up below:

Comment on lines +179 to +183
if timer !== nothing
close(timer)
waiter_left[] = true
end
return res
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might still be a race condition here?

wait() will return once this Task is woken up from a notify(c), but before it relocks the c.lock, it looks like? I guess the relockall line happens after this.

So there's a gap where the timer could go off, and the Timer task would grab the c.lock before this task, and then it could attempt to remove the Task from the cond queue and schedule it (again) while it's already scheduled, before this Task hits the close(timer) call?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure though, still.. Would be glad to talk it through if you'd like

Comment on lines +47 to +56
@spawn begin
sleep(0.01)
notify(a)
end
@test try
wait(a; timeout=2)
true
catch
false
end
Copy link
Member

@NHDaly NHDaly Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a racy test, right? If we're running with 2 threads, there's a chance the spawned task could notify and you'd miss it. I think you could make it not racy by using @async?

@JamesWrigley
Copy link
Contributor

How difficult would it be to re-use this implementation for waiting on other objects like Event/Channel etc? (not saying it should be part of this PR, just curious)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wait() with timeout
5 participants