Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially harfmul SynchronizedObject usages on iOS platform #4282

Open
qwwdfsad opened this issue Nov 28, 2024 · 33 comments
Open

Potentially harfmul SynchronizedObject usages on iOS platform #4282

qwwdfsad opened this issue Nov 28, 2024 · 33 comments
Labels

Comments

@qwwdfsad
Copy link
Collaborator

qwwdfsad commented Nov 28, 2024

We have a report in public Slack about very slow CMP application: https://kotlinlang.slack.com/archives/C0346LWVBJ4/p1732321550585009
Luckily, the user provided an instrument profile trace, and, it's visible that some huge chunks of work are spent in our synchronization.

Notably, a good chunk (41%) of the time between 02:01 and 02:10 is spent within
kfun:kotlinx.coroutines.internal.LimitedDispatcher.obtainTaskOrDeallocateWorker#internal -> kfun:kotlinx.atomicfu.locks.SynchronizedObject#lock(){}. It requires further investigation, but it is worth taking a look

Instruments file for the history: Hang iPhone SE CMP 1.7.0.trace.zip

@qwwdfsad qwwdfsad changed the title Potentially slow SynchronizedObject usages Potentially harfmul SynchronizedObject usages on iOS platform Nov 28, 2024
@qwwdfsad
Copy link
Collaborator Author

Waiting for the specific coroutines version, as atomicfu got a lock upgrade recently

@dkhalanskyjb
Copy link
Collaborator

Will this issue be considered solved if LimitedDispatcher is rewritten to be lock-free, or are all usages of locks to be investigated?

@qwwdfsad
Copy link
Collaborator Author

Would be nice to understand the root of the issue (it seems like it's LimitedDispatcher peculiarity) and address it instead. I don't think avoiding locks on K/N is justified

@dkhalanskyjb
Copy link
Collaborator

I'll double-check, but after a brief refresher on the source code, I see no fault in LimitedDispatcher. Maybe that's the iOS priority inversion issue we've been hearing about. The lock in LimitedDispatcher is only taken for a short time, with no potential for spin-locking on it that I've noticed.

@qwwdfsad
Copy link
Collaborator Author

Might be the case with allocation-spamming lock either, I've asked the user about coroutines version

@creativedrewy
Copy link

I'm the OP from that Slack conversation, and was asked to provide the details here. I was just able to generate the freeze yet again, and here are the set of Kotlin & related dependencies I'm using. To answer the primary question asked above, I'm using kotlinx-coroutines-core version 1.9.0:

Screenshot 2024-12-02 at 10 45 21 AM

@creativedrewy
Copy link

creativedrewy commented Dec 9, 2024

As an update, I have updated my codebase to Kotlin 2.1 and Compose Multiplatform 1.7. Coroutines are still version 1.9.

The original genesis for this issue was my app freezing, but thankfully that has not happened. However, the phone is still getting hot, and profiling with Time Profiler shows large amounts of SynchronizedObject calls. Attached is an image of the profiling. I will provide the updated trace file in the Kotlin Slack.

Screenshot 2024-12-09 at 12 29 35 PM

@dkhalanskyjb
Copy link
Collaborator

Thanks! The culprit seems to be https://github.com/Kotlin/kotlinx-atomicfu/blob/bad63e743ac905a298386446a82a88225c2f71fc/atomicfu/src/nativeMain/kotlin/kotlinx/atomicfu/locks/Synchronized.kt#L13-L66. There is a work in progress to replace them with a more efficient implementation on the atomicfu side, so the problem is expected to be solved in future releases. It's not going to be very soon, though, as writing efficient mutex implementations takes a lot of effort.

It doesn't seem like there's evidence of coroutines using the mutex incorrectly, which was the original topic of this issue. Can this be closed?

@creativedrewy
Copy link

I suppose, but a few follow up questions:

  • Might I be able to provide my app's codebase to Jetbrains privately so you could use it for testing? I feel like this all especially pushes the limits of atomicfu through extensive use of SharedFlow - required for a websocket. It could be useful to benchmark if the updates improve things.
  • Is there anything in the interim I can do to mitigate this? I can't go live with intermittent app freezes.
  • If this issue is closed, is there any place I can still track updates that will improve this functionality?

Thanks!

@dkhalanskyjb
Copy link
Collaborator

Is there anything in the interim I can do to mitigate this? I can't go live with intermittent app freezes.

I thought this no longer happened with newer Kotlin and Compose versions? Is there a reason not to upgrade?

Might I be able to provide my app's codebase to Jetbrains privately so you could use it for testing?

Yes, this would be excellent, thanks! You could share a link / an archive / whatever with me in the kotlinlang Slack (Dmitry Khalanskiy [JB]) or add me to your project on github. I'll share your project with other teams internally if you do.

If this issue is closed, is there any place I can still track updates that will improve this functionality?

Good point, I don't think there's a public Github issue in atomicfu for tracking this. Let's leave this one open, then, and I'll close it if we do open an atomicfu issue.

@creativedrewy
Copy link

Thank you for the reply! I've sent you my app's codebase on the Koltinlang Slack.

Unfortunately, it is still happening with Compose 1.7 and Kotlin 2.1. I just profiled my app in release mode and was able to obtain the freeze. It appears most likely to happen when scrolling lists; I'll be scrolling, navigating back and forth, scroll some more, and then freeze.

What's interesting is that the "long" usage of SynchronizedObject.lock() shows up on more threads now - worker threads, child threads and the Main thread.

The file is too big to attach here, so I will send it to you as well.

@creativedrewy
Copy link

As another update, as I was scrolling through the same list that can sometimes cause the freezing, I got this Thread Performance Checker message:

Thread Performance Checker: Thread running at User-interactive quality-of-service class waiting on a lower QoS thread running at Default quality-of-service class. Investigate ways to avoid priority inversions
PID: 57512, TID: 2040891
Backtrace
=================================================================
3   DRiP Haus                           0x0000000106148f0c _ZN12SkMessageBusIN5skgpu27UniqueKeyInvalidatedMessageEjLb1EE5Inbox4pollEPN12skia_private6TArrayIS1_Lb0EEE + 148
4   DRiP Haus                           0x0000000106147818 _ZN15GrResourceCache13purgeAsNeededEv + 52
5   DRiP Haus                           0x000000010622bd9c _ZN17GrMtlRenderTargetC1EP8GrMtlGpu7SkISize5sk_spI15GrMtlAttachmentES5_NS_7WrappedENSt3__117basic_string_viewIcNS7_11char_traitsIcEEEE + 288
6   DRiP Haus                           0x000000010622c0f4 _ZN17GrMtlRenderTarget23MakeWrappedRenderTargetEP8GrMtlGpu7SkISizeiPU21objcproto10MTLTexture11objc_object + 468
7   DRiP Haus                           0x0000000106222424 _ZN8GrMtlGpu25onWrapBackendRenderTargetERK21GrBackendRenderTarget + 204
8   DRiP Haus                           0x0000000106139238 _ZN5GrGpu23wrapBackendRenderTargetERK21GrBackendRenderTarget + 156
9   DRiP Haus                           0x000000010614068c _ZN15GrProxyProvider23wrapBackendRenderTargetERK21GrBackendRenderTarget5sk_spIN5skgpu16RefCntedCallbackEE + 92
10  DRiP Haus                           0x00000001061e4a40 _ZN10SkSurfaces23WrapBackendRenderTargetEP18GrRecordingContextRK21GrBackendRenderTarget15GrSurfaceOrigin11SkColorType5sk_spI12SkColorSpaceEPK14SkSurfacePropsPFvPvESD_ + 484
11  DRiP Haus                           0x000000010653e4a0 org_jetbrains_skia_Surface__1nMakeFromBackendRenderTarget + 124
12  DRiP Haus                           0x0000000105ed55e0 kfun:org.jetbrains.skia.Surface.Companion#makeFromBackendRenderTarget(org.jetbrains.skia.DirectContext;org.jetbrains.skia.BackendRenderTarget;org.jetbrains.skia.SurfaceOrigin;org.jetbrains.skia.SurfaceColorFormat;org.jetbrains.skia.ColorSpace?;org.jetbrains.skia.SurfaceProps?){}org.jetbrains.skia.Surface? + 756
13  DRiP Haus                           0x000000010697c568 kfun:androidx.compose.ui.window.MetalRedrawer.draw#internal + 7044
14  DRiP Haus                           0x000000010697e994 kfun:androidx.compose.ui.window.MetalRedrawer.<init>$lambda$1#internal + 472
15  DRiP Haus                           0x0000000106980d8c kfun:androidx.compose.ui.window.MetalRedrawer.$<init>$lambda$1$FUNCTION_REFERENCE$1.invoke#internal + 72
16  DRiP Haus                           0x0000000106980e5c kfun:androidx.compose.ui.window.MetalRedrawer.$<init>$lambda$1$FUNCTION_REFERENCE$1.$<bridge-DNN>invoke(){}#internal + 72
17  DRiP Haus                           0x0000000105b4c710 kfun:kotlin.Function0#invoke(){}1:0-trampoline + 100
18  DRiP Haus                           0x0000000106981d34 kfun:androidx.compose.ui.window.DisplayLinkProxy.handleDisplayLinkTick#internal + 152
19  DRiP Haus                           0x0000000106981de8 kfun:androidx.compose.ui.window.DisplayLinkProxy.$imp:handleDisplayLinkTick#internal + 144
20  QuartzCore                          0x0000000189dc34ac _ZN2CA7Display15DisplayLinkItem9dispatch_ERNS_8SignPost8IntervalILNS2_11CAEventCodeE835322056EEE + 44
21  QuartzCore                          0x0000000189dc46d4 _ZN2CA7Display11DisplayLink14dispatch_itemsEyyy + 804
22  QuartzCore                          0x0000000189dc42a0 _ZN2CA7Display11DisplayLink8callbackEP15_CADisplayTimeryyybPv + 632
23  QuartzCore                          0x0000000189ecab9c _ZL22display_timer_callbackP12__CFMachPortPvlS1_ + 336
24  CoreFoundation                      0x00000001803b84e4 __CFMachPortPerform + 172
25  CoreFoundation                      0x00000001803eeddc __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__ + 56
26  CoreFoundation                      0x00000001803ee3ac __CFRunLoopDoSource1 + 496
27  CoreFoundation                      0x00000001803e89bc __CFRunLoopRun + 2176
28  CoreFoundation                      0x00000001803e7d28 CFRunLoopRunSpecific + 572
29  GraphicsServices                    0x000000018e7cdbc0 GSEventRunModal + 160
30  UIKitCore                           0x00000001852bafdc -[UIApplication _run] + 868
31  UIKitCore                           0x00000001852bec54 UIApplicationMain + 124
32  SwiftUI                             0x00000001c4b04524 OUTLINED_FUNCTION_70 + 500
33  SwiftUI                             0x00000001c4b043c4 OUTLINED_FUNCTION_70 + 148
34  SwiftUI                             0x00000001c4816108 OUTLINED_FUNCTION_2 + 92
35  DRiP Haus                           0x00000001047a5e80 $s9DRiP_Haus6iOSAppV5$mainyyFZ + 40
36  DRiP Haus                           0x00000001047a5f30 main + 12

@dkhalanskyjb
Copy link
Collaborator

Thread running at User-interactive quality-of-service class waiting on a lower QoS thread running at Default quality-of-service class. Investigate ways to avoid priority inversions

We've heard that mutex implementations like our one can suffer from QoS problems (Kotlin/kotlinx-atomicfu#462), and this looks like solid proof. Thanks for the codebase! I'm sure it will be invaluable in our investigation.

@stefanhaustein
Copy link

@creativedrewy Would it be feasible for you to verify that this updated file for compose-multiplatform-core ios addresses the problem: https://gist.github.com/stefanhaustein/5a36e66672390a8b314e63f46e7baefe

(I have a slightly better pull request for atomicfu but unfortunately compose used their own version of synchronizedobject...)

@creativedrewy
Copy link

@stefanhaustein thanks for the reply! I would need to clone the compose core/runtime repositories locally and consume in my app in order to replace with this implementation, correct?

@stefanhaustein
Copy link

stefanhaustein commented Jan 8, 2025

Yes, exactly (probably via publishing the modified compose core repro to local maven and redirecting your dependency accordingly)

P.S: Relevant part of my notes:

  • Ensure Java 17
  • ./gradlew :mpp:publishComposeJbToMavenLocal -Pcompose.platforms=all

@creativedrewy
Copy link

Sadly we don't have any of this infrastructure in place currently; we're a small team and glancing at the onboarding docs this appears to be a significant effort to get setup.

Would you be able to provide a sample configured codebase or binary that I could drop in to my repo?

@stefanhaustein
Copy link

Yeah, I fell for the generic androidx onboarding doc as well and wasted half a day setting things up as described there... ¯\(ツ)

More relevant documentation for setting up compose-multiplatform-core locally is here: https://github.com/JetBrains/compose-multiplatform-core/blob/jb-main/MULTIPLATFORM.md

Unfortunately, I have not created a fork, so it's a bit tricky to share my changed project... But I really didn't change much (if anything) apart from replacing SynchronizedObject... (I can double-check tomorrow when back at my desk)

Basically I did the following steps:

  1. Clone the compose-multiplatform-core repository and check that my setup is correct (using the command line / gradle command from the multiplatform doc)
  2. Publish the project to local maven: ./gradlew :mpp:publishComposeJbToMavenLocal -Pcompose.platforms=all
  3. Check that my depending project still works when pointed to the local cmc project
  4. Change something simple and "obvious" in cmc and re-publish to verify that I am actually using the local project
  5. Actually change SynchronizedObject, re-publish and check again...

@stefanhaustein
Copy link

P.S.: Turns out I did a fork... https://github.com/stefanhaustein/compose-multiplatform-core

It has some extra half-cooked logging but should just work "out of the box" on ios via the local maven repro

@creativedrewy
Copy link

@stefanhaustein what published local version number are you using, and which dependencies should be updated to utilize the local build? For example, I have updated my compose dependency:

compose = "1.7.1-local"

Which is the local version number I have specified, but changes to said locally published code do not appear to show in the app.

@stefanhaustein
Copy link

stefanhaustein commented Jan 10, 2025

I did not set any number and then was using

force("org.jetbrains.compose.runtime:runtime:0.0.0-SNAPSHOT")

in the "multiplatform-compose" project that contains some benchmarks for "multiplatform-compose-core".

See stefanhaustein/compose-multiplatform@c6dc8b7#diff-a9b09c9df054c9c227aae1b0573c11551987717eb2cd4dc158cfc3f47b15b3c5R13

(the change has some noise but should convey the idea O:)

@stefanhaustein
Copy link

@creativedrewy Were you able to make any progress here? It would be really great if we could get some more insights into addressing this problem...

@pjBooms
Copy link

pjBooms commented Jan 21, 2025

@stefanhaustein the app has problems with atomicfu implementation of SynchronizedObject that is used by Kotlin couroutines and Flow implementations. Replacing SynchronizatedObject in Compose won't show anything (its usage is only 0.15% in compare with 86% usage of SynchronizedObject from atomicfu).

@pjBooms
Copy link

pjBooms commented Jan 21, 2025

Moreover, as we see from the stack trace, the QoS warning did not come from any SynchronizedObject implementation but from Skia's SkMutex: https://github.com/google/skia/blob/e13090e31f39a519003f6b4a285a1e122e1fa59e/src/core/SkMessageBus.h#L120

@pjBooms
Copy link

pjBooms commented Feb 6, 2025

I have made further analyses of the issue in cooperation with @creativedrewy.

It seems that we have at least two problems with the current implementation of SynchronizedObject in atomicfu.

First, we do unnecessary spin looping by the following execution path:
-- thread 1: acquires the lock
-- thread 2: waits on the lock moving it to FAT status
-- thread 1: unlocks the FAT lock:

                        // last nested unlock -> release completely, resume some waiter
                        val releasedLock = LockState(FAT, 0, state.waiters - 1, null, state.mutex)
                        if (lock.compareAndSet(state, releasedLock)) {
                            releasedLock.mutex!!.unlock()
-- thread 3: tries to acquire the same lock at the same time and does spin looping:
                    FAT -> {
                        if (currentThreadId == state.ownerThreadId) {
                            // reentrant lock
                            val nestedFatLock =
                                LockState(FAT, state.nestedLocks + 1, state.waiters, state.ownerThreadId, state.mutex)
                            if (lock.compareAndSet(state, nestedFatLock)) return
                        } else if (state.ownerThreadId != null) {

both conditions are not satisfied because state.ownerThreadId == null at the moment and we go to infinite loop until thread 2 acquires the lock.

The problem here that I see many threads (tens of them) that do spin looping like thread 3 occupying all the CPUs and not allowing thread 2 to acquire the lock. So I see that some threads do tens thousands and even millions spin looping iterations.

Second problem is priority inversion (QoS) like @stefanhaustein mentioned. If the thread 3 in the picture above has "User interactive" QoS class, it locks the device that causes the app freeze.
Main UI thread has "User interactive" QoS class, while other threads that LimitedDispatcher creates has Default QoS class, so the thread 2 cannot receive CPU ticks to acquire the lock and move thread 3 out of its spin looping.

@dkhalanskyjb
Copy link
Collaborator

The upcoming implementation of the mutex (Kotlin/kotlinx-atomicfu#494) does not rely on spin locks. It's crucial for us to understand whether that's enough not to worry about priority inversion.

@pjBooms
Copy link

pjBooms commented Feb 7, 2025

It's crucial for us to understand whether that's enough not to worry about priority inversion.

Unfortunately, the upcoming implementation (as well as Kotlin/kotlinx-atomicfu#508) does not handle QoS on Apple platforms anyhow. The implementation of @stefanhaustein (Kotlin/kotlinx-atomicfu#499) handles it via donation, but I am not sure which implementation will land to atomicfu first. Probably it is good idea to combine the efforts and add QoS donating logic from Kotlin/kotlinx-atomicfu#499 to Kotlin/kotlinx-atomicfu#508.

Both implementations address the case of this issue because they do not do spin locking, but without addressing QoS, the app may still suffer from priority inversion problem.

I would consider the problem as critical now because any KMP app on iOS may potentially suffer from similar symptoms.
//cc @qwwdfsad @ndkoval @bbrockbernd

@dkhalanskyjb
Copy link
Collaborator

without addressing QoS, the app may still suffer from priority inversion problem.

We do know about this theoretical issue, but no practical examples of anything of this sort are known to us that don't also rely on spin locking. So far, the experiments suggest that spinning + lack of QoS donations is the source of problems, not just the lack of QoS donations in isolation.

Probably it is good idea to combine the efforts and add QoS donating logic from Kotlin/kotlinx-atomicfu#499 to Kotlin/kotlinx-atomicfu#508

If without it, starvation can occur, then certainly, yes. If issues can't actually occur because of the lack of QoS donations, then it's best not to. That's why it's crucial for us to know if this is worth doing.

@pjBooms
Copy link

pjBooms commented Feb 7, 2025

If without it, starvation can occur

According to the analysis above, UI thread may wait on a lock that is acquired by a thread with the default QoS class. Moreover, we see that tens threads with default QoS class work concurrently for this particular app. It means that when the UI thread waits on a lock acquired by a default QoS thread, the thread shares CPUs with other threads equally (that apple's QoS policy is trying to avoid).
Without spin locking in the UI thread, we should not see total app freeze (I will consult @creativedrewy, how to try Kotlin/kotlinx-atomicfu#508 to check it in his environment) however the whole app responsiveness will suffer of course for this real non-theoretical case.

@bbrockbernd
Copy link

Hi there! I have been able to reproduce this issue on an iOS device. And some of the detected hangs indeed are caused by Synchronized.lock from atomicfu. When I test the new implementation (without QoS taken into account) the hangs (caused by locking) seem to disappear and it looks like the app runs slightly smoother.
Therefore, it looks like the problem of this specific issue is caused by inefficiencies in the current atomicfu lock implementation and are (for what I have seen on my iOS device) unrelated to QoS.

@dkhalanskyjb
Copy link
Collaborator

dkhalanskyjb commented Feb 7, 2025

responsiveness will suffer of course

It's not at all obvious to me that it will. If you are using a synchronous lock in a UI thread, the operations performed by non-UI threads using the same lock must be short and release the lock quickly. The expected scenario is that the QoS of the code holding the same lock as the UI thread is only going to be violated for very short spans of time in any case.

QoS donations aren't free: they involve extra allocations and syscalls. This adds some overhead. Is the overhead actually smaller than the performance win we'd get from using a higher priority to execute a short-lived operation? Without measurements, it's not obvious.

If someone has any measurements/anecdotes/issue tickets/anything objective showing the overwhelming impact of QoS on thread scheduling, this will help tremendously. Without them, we're just exchanging opinions.

@pjBooms
Copy link

pjBooms commented Feb 7, 2025

Is the overhead actually smaller than the performance win we'd get from using a higher priority to execute a short-lived operation?

Are you thinking about locks usages from coroutines implementation only? We have a demand for general-purpose locks and we cannot guarantee that in that scenarios all locks usages will be short-lived operations.

If someone has any measurements/

I think we can perform measurements for this particular app. Add QoS donation to Kotlin/kotlinx-atomicfu#508 implementation and then measure total locks waiting in the UI thread with and without QoS donation. I believe @bbrockbernd can easily do this. We can also create synthetic benchmarks that involves a thread with "User interactive" QoS class and measure QoS-donation overhead vs reduced lock-waits in "User interactive" thread in a contention scenario.

Without them, we're just exchanging opinions

The issue contains links in the end where it is shown how important is to address QoS problem (f.i. mozilla's case).
Though, of course, we need to measure any particular implementation to be sure that we improve things not degrade (f.e. first versions of QoS donating lock implementation by @stefanhaustein degraded Compose benchmarks).

@dkhalanskyjb
Copy link
Collaborator

Are you thinking about locks usages from coroutines implementation only? We have a demand for general-purpose locks and we cannot guarantee that in that scenarios all locks usages will be short-lived operations.

Blocking operations on the UI thread must be short-lived or avoided entirely. This is true regardless of coroutines usage.

f.i. mozilla's case

From the link:

On macOS we relied for a long time on OSSpinLock locks.

So, spinlocks. Which is not what we're dealing with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants