[Perf] Linux/x64: Regressions in SIMD.ConsoleMandel and System.Numerics.Tests.Perf_BigInteger #105329

performanceautofiler · 2024-07-23T08:52:56Z

Run Information

Name	Value
Architecture	x64
OS	ubuntu 22.04
Queue	ViperUbuntu
Baseline	19f03850cafa68cf396ecadf86e19df714b0a280
Compare	223249fa87a5f84cc67e83699f64ca80180a1862
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in SIMD.ConsoleMandel

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
ScalarDoubleSinglethreadADT - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	649.42 ms	693.68 ms	1.07	0.00	True
ScalarFloatSinglethreadADT - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	656.66 ms	692.76 ms	1.05	0.00	True

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'SIMD.ConsoleMandel*'

SIMD.ConsoleMandel.ScalarDoubleSinglethreadADT

ETL Files

Histogram

JIT Disasms

SIMD.ConsoleMandel.ScalarFloatSinglethreadADT

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	x64
OS	ubuntu 22.04
Queue	ViperUbuntu
Baseline	19f03850cafa68cf396ecadf86e19df714b0a280
Compare	223249fa87a5f84cc67e83699f64ca80180a1862
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
HeapSort - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	16.57 μs	18.02 μs	1.09	0.08	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Tests.Perf_PriorityQueue&lt;Int32, Int32&gt;*'

System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>.HeapSort(Size: 1000)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	x64
OS	ubuntu 22.04
Queue	ViperUbuntu
Baseline	19f03850cafa68cf396ecadf86e19df714b0a280
Compare	223249fa87a5f84cc67e83699f64ca80180a1862
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Numerics.Tests.Perf_BigInteger

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
GreatestCommonDivisor - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	4.37 μs	4.68 μs	1.07	0.00	True

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_BigInteger*'

System.Numerics.Tests.Perf_BigInteger.GreatestCommonDivisor(arguments: 1024,1024 bits)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

LoopedBard3 · 2024-07-23T16:39:24Z

Potentially caused by: #104752, improvements listed in PR. @AndyAyersMS

LoopedBard3 · 2024-07-23T16:52:01Z

Related Regressions:
Windows/x64: dotnet/perf-autofiling-issues#38719
Linux/x64: dotnet/perf-autofiling-issues#38695

AndyAyersMS · 2024-07-23T21:28:27Z

Will see if this is fixable in 9.0...

dotnet-policy-service · 2024-07-24T18:29:35Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

AndyAyersMS · 2024-08-07T17:28:25Z

These all seem to be viper (zen4) specific, and linux specific as well. Since other arch/os combinations are ok, I'm going to defer this one.

AndyAyersMS · 2025-04-14T16:43:42Z

Related regressions are resolved already. But three of the regressions listed here persist:

and one is largely resolved (very modal benchmark)

AndyAyersMS · 2025-04-14T17:37:40Z

As noted earlier, the GreatestCommonDivisor regression is just on Zen4 Linux (light blue below)

But the ConsoleMandel is more widespread:

AndyAyersMS · 2025-04-14T21:35:43Z

On intel linux, for .net 8, the inner loop is

G_M61269_IG05:        ; offs=0x000084, size=0x0033, bbWeight=1109935.07, PerfScore 36905341.03, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, BB05 [0003], byref, isz

IN001b: 000084 vmulsd   xmm9, xmm6, xmm6
IN001c: 000088 vmulsd   xmm10, xmm7, xmm7
IN001d: 00008C vsubsd   xmm9, xmm9, xmm10
IN001e: 000091 vaddsd   xmm6, xmm6, xmm6
IN001f: 000095 vmulsd   xmm7, xmm6, xmm7
IN0020: 000099 vaddsd   xmm6, xmm9, xmm3
IN0021: 00009D vaddsd   xmm7, xmm7, xmm5
IN0022: 0000A1 inc      ecx
IN0023: 0000A3 vmulsd   xmm9, xmm6, xmm6
IN0024: 0000A7 vmulsd   xmm10, xmm7, xmm7
IN0025: 0000AB vaddsd   xmm9, xmm9, xmm10
IN0026: 0000B0 vucomisd xmm8, xmm9
IN0027: 0000B5 jbe      SHORT G_M61269_IG07

G_M61269_IG06:        ; offs=0x0000B7, size=0x0008, bbWeight=1106928.83, PerfScore 1383661.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, loop=IG05, BB06 [0004], byref, isz

IN0028: 0000B7 cmp      ecx, 0x3E8
IN0029: 0000BD jl       SHORT G_M61269_IG05

whereas later on it becomes

_M61269_IG10:        ; offs=0x0000E0, size=0x003C, bbWeight=60645.02, PerfScore 2137736.83, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, BB04 [0003], byref, isz

IN0034: 0000E0 vmulsd   xmm7, xmm3, xmm3
IN0035: 0000E4 vmulsd   xmm8, xmm5, xmm5
IN0036: 0000E8 vsubsd   xmm7, xmm7, xmm8
IN0037: 0000ED vaddsd   xmm3, xmm3, xmm3
IN0038: 0000F1 vmulsd   xmm5, xmm3, xmm5
IN0039: 0000F5 vmovsd   qword ptr [V14 rbp-0x48], xmm1
IN003a: 0000FA vaddsd   xmm3, xmm7, xmm1
IN003b: 0000FE vmovsd   qword ptr [V12 rbp-0x40], xmm4
IN003c: 000103 vaddsd   xmm5, xmm5, xmm4
IN003d: 000107 inc      ecx
IN003e: 000109 vmulsd   xmm7, xmm3, xmm3
IN003f: 00010D vmulsd   xmm8, xmm5, xmm5
IN0040: 000111 vaddsd   xmm7, xmm7, xmm8
IN0041: 000116 vucomisd xmm6, xmm7
IN0042: 00011A jbe      SHORT G_M61269_IG08

G_M61269_IG11:        ; offs=0x00011C, size=0x000C, bbWeight=60240.69, PerfScore 75300.87, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, BB05 [0004], byref

IN0043: 00011C cmp      ecx, 0x3E8
IN0044: 000122 jge      G_M61269_IG08

G_M61269_IG12:        ; offs=0x000128, size=0x000C, bbWeight=59932.78, PerfScore 479462.26, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, loop=IG10, BB16 [0033], byref, isz

IN0045: 000128 vmovsd   xmm1, qword ptr [V14 rbp-0x48]
IN0046: 00012D vmovsd   xmm4, qword ptr [V12 rbp-0x40]
IN0047: 000132 jmp      SHORT G_M61269_IG10

So there are two xmm spill/reloads in the inner loop now.

Root cause for this is slightly more aggressive copy prop, likely the result of phi refinement

VN based copy assertion for [000258] V36 $241 by [000303] V14 $241.
N001 (  1,  2) [000258] -----+-----                         *  LCL_VAR   double V36 tmp17        u:2 $241
copy propagated to:
N001 (  1,  2) [000258] -----+-----                         *  LCL_VAR   double V14 loc8         u:3 $241

and this likely creates a conflict that LSRA is unable to resolve without a spill.

Not clear there is any good fix here. Seems like the previous behavior where there were temps in the loops gave LSRA natural split points that are now gone.

The regressions are all fairly small so I am just going to close this.

performanceautofiler bot added arch-x64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Jul 23, 2024

performanceautofiler bot mentioned this issue Jul 23, 2024

[SENTINEL] Autofile run complete at 7/23/2024 9:02:08 AM. 13 issues filed. dotnet/perf-autofiling-issues#38764

Closed

LoopedBard3 transferred this issue from dotnet/perf-autofiling-issues Jul 23, 2024

ghost added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jul 23, 2024

LoopedBard3 changed the title ~~[Perf] Linux/x64: 4 Regressions on 7/17/2024 10:10:17 PM~~ [Perf] Linux/x64: Regressions in SIMD.ConsoleMandel and System.Numerics.Tests.Perf_BigInteger Jul 23, 2024

AndyAyersMS self-assigned this Jul 23, 2024

AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Jul 23, 2024

AndyAyersMS added this to the 9.0.0 milestone Jul 23, 2024

jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 24, 2024

AndyAyersMS added Priority:2 Work that is important, but not critical for the release tenet-performance-benchmarks Issue from performance benchmark and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jul 25, 2024

AndyAyersMS modified the milestones: 9.0.0, 10.0.0 Aug 7, 2024

AndyAyersMS closed this as completed Apr 14, 2025

github-actions bot locked and limited conversation to collaborators May 15, 2025

[Perf] Linux/x64: Regressions in SIMD.ConsoleMandel and System.Numerics.Tests.Perf_BigInteger #105329

[Perf] Linux/x64: Regressions in SIMD.ConsoleMandel and System.Numerics.Tests.Perf_BigInteger #105329

Comments

performanceautofiler bot commented Jul 23, 2024

Run Information

Regressions in SIMD.ConsoleMandel

Repro

SIMD.ConsoleMandel.ScalarDoubleSinglethreadADT

ETL Files

Histogram

JIT Disasms

SIMD.ConsoleMandel.ScalarFloatSinglethreadADT

ETL Files

Histogram

JIT Disasms

Docs

Run Information

Regressions in System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>

Repro

System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>.HeapSort(Size: 1000)

ETL Files

Histogram

JIT Disasms

Docs

Run Information

Regressions in System.Numerics.Tests.Perf_BigInteger

Repro

System.Numerics.Tests.Perf_BigInteger.GreatestCommonDivisor(arguments: 1024,1024 bits)

ETL Files

Histogram

JIT Disasms

Docs

LoopedBard3 commented Jul 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LoopedBard3 commented Jul 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndyAyersMS commented Jul 23, 2024

Uh oh!

dotnet-policy-service bot commented Jul 24, 2024

Uh oh!

AndyAyersMS commented Aug 7, 2024

Uh oh!

AndyAyersMS commented Apr 14, 2025

Uh oh!

AndyAyersMS commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndyAyersMS commented Apr 14, 2025

Uh oh!

LoopedBard3 commented Jul 23, 2024 •

edited

Loading

LoopedBard3 commented Jul 23, 2024 •

edited

Loading

AndyAyersMS commented Apr 14, 2025 •

edited

Loading