Skip to content

Conversation

JesusRojass
Copy link
Contributor

@JesusRojass JesusRojass commented Oct 10, 2025

WIP!!! - Fix app start trace outliers from network delays (#10733)

Discussion

Fix _app_start outliers mentioned in #10733 (Still Draft, Work In Progress Ongoing testing, seems good)

User Statements:

  • Long time reported in the _app_start metric in the 90 to 95 percentile of data in firebase console (up to 1000+ Seconds)

  • Issues seems to be appearing from background tasks that kick in the activity and ends until the first run loop runs successfully

  • Some reports have mentioned been able to have long _app_star metric when app launch is interrupted (Via locking the device or receiving a phone call)

What this fixes? (My possible reproduction ideas on why this is happening):

Case 1 - Spotty network right at cold launch

  • Force quit the app to ensure a cold start.
  • Enable a throttled network profile (e.g. Network Link Conditioner on macOS/iOS): high latency (≥ 800–1500 ms), low bandwidth, 1–5% packet loss.
  • Launch the app
  • Do not interact for some time, just letting the app do it’s thing.
  • Record Perf logs and note whether _app_start duration is far above the norm.

Case 2 - Targeted failures for early endpoints (e.g. Like if your App depended on many endpoints to launch and one of them was down)

  • Force quit the app to ensure a cold start.
  • Run the app through a proxy (I personally used Charles) and set DNS fail or HTTP 5xx / timeout for one startup endpoint at a time.
  • Launch the app and let it’s thing.
  • Observe whether _app_start stays open until the failed/slow call resolves or times out.
  • Switch which endpoint you will target to show the issue generalizes beyond a single service.

Case 3 - Background launch before foreground

  • Force quit the app to ensure a cold start.
  • Trigger a background launch (silent push via simulator tools or device test service).
  • Wait for a bit then open the app.
  • Check if the duration of _app_start approximates the time spent before you foregrounded.

Case 4 - Sature GDC Workers to limit the available thread pool

  • In my case I tested by just creating tasks that get hold and then released after x amount of time.
  • When launching the app UI could appear but anything that needs .userInitiated workers such as parsing, network callbacks, image decoding will be very delayed.
  • The _app_start should inflate.

Testing

  • Ongoing Testing, Work in Progress but locally it is passing all unit tests

API Changes

  • No API Changes

@JesusRojass
Copy link
Contributor Author

@visumickey @eBlender Draft pull request While I test

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix outliers in app start traces caused by network delays or other interruptions by introducing a 'reasonable' timeout (30 seconds) after which the app start trace is cancelled. The changes include replacing a static flag with an instance property for better state management and modifying the trace completion logic.

My review focuses on improving code clarity and simplifying the asynchronous execution. I've suggested removing a piece of uncertain-looking comment and simplifying a nested dispatch_async call that adds unnecessary complexity. The core logic for cancelling long-running app start traces appears sound and should address the reported issue.

@JesusRojass
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several improvements to fix outliers in the _app_start metric, particularly those caused by network delays or background launches. The changes include detecting background launches to invalidate the app start time, and introducing a 30-second 'reasonable' duration limit to cancel excessively long app start traces instead of reporting them. The code quality is also improved by converting a static variable to an instance property and ensuring traces are properly cancelled. The overall approach is solid and effectively addresses the stated issues. I have a couple of minor suggestions to improve code comments.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@JesusRojass
Copy link
Contributor Author

@visumickey Done

Now my fix only addresses the static variable bug that prevented multiple app launches from being measured (original static var TTIStageStarted could last until the end of the app life cycle vs the new instance var appStartTraceCompleted named like that to not confuse it with the stages concepts we want to avoid), I removed the aggressive timing reduction and I addressed the logs (I hope in the right manner, or If I need to add constants for these LMK) and I add the ability to detect background launches

@JesusRojass JesusRojass marked this pull request as ready for review October 17, 2025 17:46
@JesusRojass
Copy link
Contributor Author

@visumickey ready for review, and nit picks addressed

@JesusRojass JesusRojass changed the title WIP!!! - Fix app start trace outliers from network delays (#10733) Fix app start trace outliers from network delays (#10733) Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants