-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potentially harfmul SynchronizedObject usages on iOS platform #4282
Comments
Waiting for the specific coroutines version, as atomicfu got a lock upgrade recently |
Will this issue be considered solved if |
Would be nice to understand the root of the issue (it seems like it's |
I'll double-check, but after a brief refresher on the source code, I see no fault in |
Might be the case with allocation-spamming lock either, I've asked the user about coroutines version |
Thanks! The culprit seems to be https://github.com/Kotlin/kotlinx-atomicfu/blob/bad63e743ac905a298386446a82a88225c2f71fc/atomicfu/src/nativeMain/kotlin/kotlinx/atomicfu/locks/Synchronized.kt#L13-L66. There is a work in progress to replace them with a more efficient implementation on the atomicfu side, so the problem is expected to be solved in future releases. It's not going to be very soon, though, as writing efficient mutex implementations takes a lot of effort. It doesn't seem like there's evidence of coroutines using the mutex incorrectly, which was the original topic of this issue. Can this be closed? |
I suppose, but a few follow up questions:
Thanks! |
I thought this no longer happened with newer Kotlin and Compose versions? Is there a reason not to upgrade?
Yes, this would be excellent, thanks! You could share a link / an archive / whatever with me in the kotlinlang Slack (
Good point, I don't think there's a public Github issue in atomicfu for tracking this. Let's leave this one open, then, and I'll close it if we do open an atomicfu issue. |
Thank you for the reply! I've sent you my app's codebase on the Koltinlang Slack. Unfortunately, it is still happening with Compose 1.7 and Kotlin 2.1. I just profiled my app in release mode and was able to obtain the freeze. It appears most likely to happen when scrolling lists; I'll be scrolling, navigating back and forth, scroll some more, and then freeze. What's interesting is that the "long" usage of The file is too big to attach here, so I will send it to you as well. |
As another update, as I was scrolling through the same list that can sometimes cause the freezing, I got this Thread Performance Checker message:
|
We've heard that mutex implementations like our one can suffer from QoS problems (Kotlin/kotlinx-atomicfu#462), and this looks like solid proof. Thanks for the codebase! I'm sure it will be invaluable in our investigation. |
@creativedrewy Would it be feasible for you to verify that this updated file for compose-multiplatform-core ios addresses the problem: https://gist.github.com/stefanhaustein/5a36e66672390a8b314e63f46e7baefe (I have a slightly better pull request for atomicfu but unfortunately compose used their own version of synchronizedobject...) |
@stefanhaustein thanks for the reply! I would need to clone the compose core/runtime repositories locally and consume in my app in order to replace with this implementation, correct? |
Yes, exactly (probably via publishing the modified compose core repro to local maven and redirecting your dependency accordingly) P.S: Relevant part of my notes:
|
Sadly we don't have any of this infrastructure in place currently; we're a small team and glancing at the onboarding docs this appears to be a significant effort to get setup. Would you be able to provide a sample configured codebase or binary that I could drop in to my repo? |
Yeah, I fell for the generic androidx onboarding doc as well and wasted half a day setting things up as described there... ¯\(ツ)/¯ More relevant documentation for setting up compose-multiplatform-core locally is here: https://github.com/JetBrains/compose-multiplatform-core/blob/jb-main/MULTIPLATFORM.md Unfortunately, I have not created a fork, so it's a bit tricky to share my changed project... But I really didn't change much (if anything) apart from replacing SynchronizedObject... (I can double-check tomorrow when back at my desk) Basically I did the following steps:
|
P.S.: Turns out I did a fork... https://github.com/stefanhaustein/compose-multiplatform-core It has some extra half-cooked logging but should just work "out of the box" on ios via the local maven repro |
@stefanhaustein what published local version number are you using, and which dependencies should be updated to utilize the local build? For example, I have updated my compose dependency:
Which is the local version number I have specified, but changes to said locally published code do not appear to show in the app. |
I did not set any number and then was using
in the "multiplatform-compose" project that contains some benchmarks for "multiplatform-compose-core". (the change has some noise but should convey the idea O:) |
@creativedrewy Were you able to make any progress here? It would be really great if we could get some more insights into addressing this problem... |
@stefanhaustein the app has problems with |
Moreover, as we see from the stack trace, the QoS warning did not come from any |
I have made further analyses of the issue in cooperation with @creativedrewy. It seems that we have at least two problems with the current implementation of First, we do unnecessary spin looping by the following execution path:
both conditions are not satisfied because The problem here that I see many threads (tens of them) that do spin looping like Second problem is priority inversion (QoS) like @stefanhaustein mentioned. If the |
The upcoming implementation of the mutex (Kotlin/kotlinx-atomicfu#494) does not rely on spin locks. It's crucial for us to understand whether that's enough not to worry about priority inversion. |
Unfortunately, the upcoming implementation (as well as Kotlin/kotlinx-atomicfu#508) does not handle QoS on Apple platforms anyhow. The implementation of @stefanhaustein (Kotlin/kotlinx-atomicfu#499) handles it via donation, but I am not sure which implementation will land to Both implementations address the case of this issue because they do not do spin locking, but without addressing QoS, the app may still suffer from priority inversion problem. I would consider the problem as critical now because any KMP app on iOS may potentially suffer from similar symptoms. |
We do know about this theoretical issue, but no practical examples of anything of this sort are known to us that don't also rely on spin locking. So far, the experiments suggest that spinning + lack of QoS donations is the source of problems, not just the lack of QoS donations in isolation.
If without it, starvation can occur, then certainly, yes. If issues can't actually occur because of the lack of QoS donations, then it's best not to. That's why it's crucial for us to know if this is worth doing. |
According to the analysis above, UI thread may wait on a lock that is acquired by a thread with the default QoS class. Moreover, we see that tens threads with default QoS class work concurrently for this particular app. It means that when the UI thread waits on a lock acquired by a default QoS thread, the thread shares CPUs with other threads equally (that apple's QoS policy is trying to avoid). |
Hi there! I have been able to reproduce this issue on an iOS device. And some of the detected hangs indeed are caused by Synchronized.lock from atomicfu. When I test the new implementation (without QoS taken into account) the hangs (caused by locking) seem to disappear and it looks like the app runs slightly smoother. |
It's not at all obvious to me that it will. If you are using a synchronous lock in a UI thread, the operations performed by non-UI threads using the same lock must be short and release the lock quickly. The expected scenario is that the QoS of the code holding the same lock as the UI thread is only going to be violated for very short spans of time in any case. QoS donations aren't free: they involve extra allocations and syscalls. This adds some overhead. Is the overhead actually smaller than the performance win we'd get from using a higher priority to execute a short-lived operation? Without measurements, it's not obvious. If someone has any measurements/anecdotes/issue tickets/anything objective showing the overwhelming impact of QoS on thread scheduling, this will help tremendously. Without them, we're just exchanging opinions. |
Are you thinking about locks usages from coroutines implementation only? We have a demand for general-purpose locks and we cannot guarantee that in that scenarios all locks usages will be short-lived operations.
I think we can perform measurements for this particular app. Add QoS donation to Kotlin/kotlinx-atomicfu#508 implementation and then measure total locks waiting in the UI thread with and without QoS donation. I believe @bbrockbernd can easily do this. We can also create synthetic benchmarks that involves a thread with "User interactive" QoS class and measure QoS-donation overhead vs reduced lock-waits in "User interactive" thread in a contention scenario.
The issue contains links in the end where it is shown how important is to address QoS problem (f.i. mozilla's case). |
Blocking operations on the UI thread must be short-lived or avoided entirely. This is true regardless of coroutines usage.
From the link:
So, spinlocks. Which is not what we're dealing with. |
We have a report in public Slack about very slow CMP application: https://kotlinlang.slack.com/archives/C0346LWVBJ4/p1732321550585009
Luckily, the user provided an instrument profile trace, and, it's visible that some huge chunks of work are spent in our synchronization.
Notably, a good chunk (41%) of the time between 02:01 and 02:10 is spent within
kfun:kotlinx.coroutines.internal.LimitedDispatcher.obtainTaskOrDeallocateWorker#internal
->kfun:kotlinx.atomicfu.locks.SynchronizedObject#lock(){}
. It requires further investigation, but it is worth taking a lookInstruments file for the history: Hang iPhone SE CMP 1.7.0.trace.zip
The text was updated successfully, but these errors were encountered: