-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noble backport for pc kernel fix #2151
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If you want a coroutine function not to be called multiple times concurrently with asyncio, you can now use the async_helpers.exclusive decorator. This will wrap the coroutine function in a lock. LP: #2084032 Signed-off-by: Olivier Gayot <[email protected]> (cherry picked from commit 3f340c1) Backported so it does not run the new unittest on older versions of Python - since they do not support asyncio.Barrier.
The _get_system() method essentially does 3 important things: * it mounts stuff in /var/lib/snapd/seed * it queries snapd (and we expect snapd to read contents from /var/lib/snapd/seed) * it then tries to cleanup by unmounting what it previously mounted Unfortunately, if we cancel this task in the middle, especially while it is querying snapd, the cleanup code will fail to unmount things because snapd is still using /var/lib/snapd/seed (cancelling the query does not mean that snapd will immediately release the resources - or even stop processing the query). Typically this results in a failure to unmount pc-kernel.snap with EBUSY. Effectively, if _get_system() gets cancelled, we would need to wait until the query to snapd finishes before we run the cleanup code. But the approach that this patch takes is to make sure _get_system() is not cancelled in the first place. _get_system() was subject to cancellation because it runs as part of _examine_systems - which is cancelled and restarted when the source changes. We now make sure that we shield _get_system() from cancellation when we execute it from _examine_systems. If _examine_system gets cancelled, _get_system() will now continue running in the background. This leads to another problem. If we cancel _examine_system and rerun it, we will end up with two concurrent calls to _get_system() (the new one + the one we want to finish). Obviously, things go sideways if this happens. To avoid the problem, we now use the async_helpers.exclusive decorator for _get_system() - so that only one task runs the critical section at a given time. LP: #2084032 Signed-off-by: Olivier Gayot <[email protected]> (cherry picked from commit 1f94353) Backported to use _get_system() instead of SystemGetter.get() from the original patch.
Internally, _get_system(variation) expects that the variation exists for the currently configured source. However, because of the lock we added to _get_system(variation), it is now more likely that the configured source has changed by the time we start mounting stuff and querying snapd. When that happens, _get_system(variation) fails with KeyError ; because the variation is no longer present in the currently configured source. Fixed by making sure that we use the source handler that was configured when _get_system(variation) was first invoked. LP: #2084032 Signed-off-by: Olivier Gayot <[email protected]> (cherry picked from commit 65983be) Backported to use _get_system() instead of SystemGetter.get() from the original patch.
When calling a coroutine function that is decorated with "async_helpers.exclusive", one can now optionally pass an asyncio.Event. If it is specified, the coroutine will set the event after it starts executing the body of the coroutine (i.e. after it acquires the lock). This is useful if the coroutine function is normally not safe to cancel in the middle of its critical section. Using this mechanism, the parent can know if the task can still be cancelled or if it is too late. Signed-off-by: Olivier Gayot <[email protected]> (cherry picked from commit 99be092)
Now that _get_systems is marked async_helpers.exclusive, when we change the source multiple times quickly, we create a "queue" of _get_systems tasks. This can take a long time to consume, and really we only need the result of the last query. When _examine_systems gets cancelled, we now check if the _get_systems task has successfully acquired the lock. If it hasn't, then we cancel it. LP: #2084032 Signed-off-by: Olivier Gayot <[email protected]> (cherry picked from commit 9218456) Backported to use _get_system() instead of SystemGetter.get() from the original patch.
8d9ec9c
to
e73f334
Compare
dbungert
approved these changes
Jan 28, 2025
Chris-Peterson444
approved these changes
Jan 28, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Testing done
Although I haven't managed to reproduce the exact unmount issue myself using the 24.04.1 ISO, I was able to mess up the contents of
/var/lib/snapd/seeds
by changing the source multiple times. Things that could go wrong before the patch include:All these are fixed after applying the patches - which means the cleanup code is working properly. And therefore no pc-kernel unmount issue should occur.