Skip to content

Flaky test analysis: Xamarin.Android-PR pipeline, April 27 – May 11 2026 (follow-up to #11203) #11320

@simonrozsival

Description

@simonrozsival

Flaky test analysis: Xamarin.Android-PR pipeline, April 27 – May 11 2026 (follow-up to #11203)

Follow-up to #11203 — refreshes the flaky-test picture two weeks later, after the fixes that
landed in between (notably #11249 for PerformanceTest and #11257 removing the WearOS lane).

Scope

Period 2026-04-27 → 2026-05-11 (14 days, ending today)
Pipeline Xamarin.Android-PR (devdiv definition ID 12278)
Method Identical to #11203 — see that issue for full methodology
PRs analyzed 62 merged PRs
Direct (team-member) PRs 62 (no fork PRs in window)
PRs merged with failed build result 23 (vs. 38 in #11203 ⇒ ~40% drop)
Unique test names observed in (Auto-Retry) runs 80 (vs. 51)
Tests with ≥1 🔴 15 (vs. 33)

Signal categories are the same as #11203: 🔴 = passed on auto-retry in a failed build (highest-confidence flake), 🟢 = passed on auto-retry in a succeeded build, ⚠️ = still failed on auto-retry.

Headline observations

  1. PerformanceTest is no longer a red-CI source. The six top Build_* / Install_* /
    DesignTimeBuild_* perf tests from Flaky test analysis: Xamarin.Android-PR pipeline, April 10–24 2026 #11203 each produced zero 🔴 events this window.
    [tests] Skip PerformanceTest on slow CI machines using evaluation time #11249's Assert.Inconclusive slow-machine guard is working.
  2. WearOS noise is gone. Four WearOS-only debugger variants from Flaky test analysis: Xamarin.Android-PR pipeline, April 10–24 2026 #11203's top 7
    disappeared because the WearOS lane was removed in [build] Remove WearOS emulator CI lane #11257.
  3. Debugger-attach (ApplicationRunsWithDebuggerAndBreaks) remains the largest residual
    flake family.
  4. NativeAOT lane is producing fresh flakes — expected as that lane stabilizes.
  5. New top offender: TypeAndMemberRemapping(False,MonoVM) — 3 🔴 across 3 different PRs
    ([NativeAOT] Use NativeLinker and invoke lld directly for linking #11256, Split android-reviewer review-rules.md into conditional rule files #11266, LEGO: Pull request from juno/hb_6dddf33b-c6da-43d8-ac04-14d2c339cb00_20260502103235064 to main #11269). Only the (False,MonoVM) parameterization flakes.

Top flaky tests with recommended action

All entries below are in suite MSBuildDeviceIntegration On Device. The "Action" column
applies the same playbook used by #11249 (soft-skip), #11202 ([Ignore]), #11110
(Inconclusive for emulator/env failure), and #10756 (remove).

# Test 🔴 🟢 ⚠️ Recommended action Notes
1 TypeAndMemberRemapping(False,MonoVM) 3 0 0 Investigate — possible real regression New high-confidence flake; only (False,MonoVM) parametrization fails; (True,CoreCLR) shows ⚠️1 only. Was 🟢1 in #11203.
2 ApplicationRunsWithDebuggerAndBreaks(False,"guest1","apk",True,MonoVM) 2 1 0 Soft-skip on debugger-attach failure Recurring from #11203. Pattern matches "device-attach race"; consider Assert.Inconclusive when the debugger-attach step times out specifically.
3 BuildBasicApplicationAndAotProfileIt 2 0 0 Investigate (AOT-profile capture is timing-sensitive) Recurring from #11203 at same rate (2→2).
4 ApplicationRunsWithDebuggerAndBreaks(False,"guest1","aab",True,MonoVM) 1 3 2 Soft-skip on debugger-attach failure Same family as #2.
5 ApplicationRunsWithDebuggerAndBreaks(False,null,"aab",True,MonoVM) 1 0 0 Soft-skip on debugger-attach failure Same family.
6 FastDeployEnvironmentFiles(False,True,MonoVM) 1 0 1 Monitor — only 1 PR New. Re-check next cycle.
7 AppWithStyleableUsageRuns(True,False,False,MonoVM) 1 0 1 Monitor — only 1 PR New.
8 CheckResouceIsOverridden(NativeAOT) 1 0 0 Monitor (NativeAOT lane) New; NativeAOT lane still stabilizing.
9 MonoAndroidExportReferencedAppStarts(False,False,MonoVM) 1 0 0 Monitor Recurring (2→1).
10 DotNetRunWaitForExit 1 0 0 Monitor — already tracked in #10832 Big improvement (5→1).
11 DotNetRun(True,"managed",NativeAOT) 1 0 0 Monitor (NativeAOT lane) New.
12 GradleFBProj(True,NativeAOT) 1 0 0 Monitor (NativeAOT lane) New.
13 MonoAndroidExportReferencedAppStarts(False,True,CoreCLR) 1 0 0 Monitor New CoreCLR variant.
14 AdbTargetChangesAppBundle 1 0 0 Monitor — only 1 PR New.
15 EnsureUncaughtExceptionWorks(MonoVM) 1 0 0 Monitor Big improvement (3→1).

Tests with elevated ⚠️ (retry-fail) counts

DotNetRunWithDeviceParameter (⚠️2) — already tracked in #10832; no 🔴 hits this window.
Other ⚠️1 tests are not strong-enough signals to act on yet.

Tests appearing in #11203 with no 🔴 hits this window

Proposed concrete follow-ups

  1. Investigate TypeAndMemberRemapping(False,MonoVM) — file a focused issue. Three
    independent 🔴 hits in one window across unrelated PRs ([NativeAOT] Use NativeLinker and invoke lld directly for linking #11256, Split android-reviewer review-rules.md into conditional rule files #11266, LEGO: Pull request from juno/hb_6dddf33b-c6da-43d8-ac04-14d2c339cb00_20260502103235064 to main #11269), one
    parametrization only.
  2. Generalize the soft-skip pattern to ApplicationRunsWithDebuggerAndBreaks — when the
    debugger-attach step itself fails (not the assertion under test), call
    Assert.Inconclusive the same way [tests] Skip PerformanceTest on slow CI machines using evaluation time #11249 does for slow-machine perf tests and Use Assert.Inconclusive for emulator acquisition failures #11110 does
    for emulator acquisition.
  3. Investigate BuildBasicApplicationAndAotProfileIt — recurring at a steady rate.
  4. NativeAOT lane: leave at "monitor" for one more cycle; if any of CheckResouceIsOverridden(NativeAOT), DotNetRun(...,NativeAOT), GradleFBProj(True,NativeAOT) recur, treat as real bugs and escalate.
  5. Repeat this analysis in ~2 weeks to confirm trend and catch regressions.

Existing related issues

Caveats

Same as #11203: only the last build per PR is sampled; only (Auto-Retry) runs carry test
detail; no error-message content collected; ⚠️1-count tests are weak signals and may be noise.

Methodology

Same script and data path as #11203. See that issue for the full methodology section.

Metadata

Metadata

Assignees

No one assigned

    Labels

    copilot`copilot-cli` or other AIs were used to author thisneeds-triageIssues that need to be assigned.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions