-
Notifications
You must be signed in to change notification settings - Fork 5k
[Perf] Linux/x64: Regressions in SIMD.ConsoleMandel and System.Numerics.Tests.Perf_BigInteger #105329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Potentially caused by: #104752, improvements listed in PR. @AndyAyersMS |
Related Regressions: |
Will see if this is fixable in 9.0... |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
These all seem to be viper (zen4) specific, and linux specific as well. Since other arch/os combinations are ok, I'm going to defer this one. |
On intel linux, for .net 8, the inner loop is G_M61269_IG05: ; offs=0x000084, size=0x0033, bbWeight=1109935.07, PerfScore 36905341.03, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, BB05 [0003], byref, isz
IN001b: 000084 vmulsd xmm9, xmm6, xmm6
IN001c: 000088 vmulsd xmm10, xmm7, xmm7
IN001d: 00008C vsubsd xmm9, xmm9, xmm10
IN001e: 000091 vaddsd xmm6, xmm6, xmm6
IN001f: 000095 vmulsd xmm7, xmm6, xmm7
IN0020: 000099 vaddsd xmm6, xmm9, xmm3
IN0021: 00009D vaddsd xmm7, xmm7, xmm5
IN0022: 0000A1 inc ecx
IN0023: 0000A3 vmulsd xmm9, xmm6, xmm6
IN0024: 0000A7 vmulsd xmm10, xmm7, xmm7
IN0025: 0000AB vaddsd xmm9, xmm9, xmm10
IN0026: 0000B0 vucomisd xmm8, xmm9
IN0027: 0000B5 jbe SHORT G_M61269_IG07
G_M61269_IG06: ; offs=0x0000B7, size=0x0008, bbWeight=1106928.83, PerfScore 1383661.04, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, loop=IG05, BB06 [0004], byref, isz
IN0028: 0000B7 cmp ecx, 0x3E8
IN0029: 0000BD jl SHORT G_M61269_IG05 whereas later on it becomes _M61269_IG10: ; offs=0x0000E0, size=0x003C, bbWeight=60645.02, PerfScore 2137736.83, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, BB04 [0003], byref, isz
IN0034: 0000E0 vmulsd xmm7, xmm3, xmm3
IN0035: 0000E4 vmulsd xmm8, xmm5, xmm5
IN0036: 0000E8 vsubsd xmm7, xmm7, xmm8
IN0037: 0000ED vaddsd xmm3, xmm3, xmm3
IN0038: 0000F1 vmulsd xmm5, xmm3, xmm5
IN0039: 0000F5 vmovsd qword ptr [V14 rbp-0x48], xmm1
IN003a: 0000FA vaddsd xmm3, xmm7, xmm1
IN003b: 0000FE vmovsd qword ptr [V12 rbp-0x40], xmm4
IN003c: 000103 vaddsd xmm5, xmm5, xmm4
IN003d: 000107 inc ecx
IN003e: 000109 vmulsd xmm7, xmm3, xmm3
IN003f: 00010D vmulsd xmm8, xmm5, xmm5
IN0040: 000111 vaddsd xmm7, xmm7, xmm8
IN0041: 000116 vucomisd xmm6, xmm7
IN0042: 00011A jbe SHORT G_M61269_IG08
G_M61269_IG11: ; offs=0x00011C, size=0x000C, bbWeight=60240.69, PerfScore 75300.87, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, BB05 [0004], byref
IN0043: 00011C cmp ecx, 0x3E8
IN0044: 000122 jge G_M61269_IG08
G_M61269_IG12: ; offs=0x000128, size=0x000C, bbWeight=59932.78, PerfScore 479462.26, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, loop=IG10, BB16 [0033], byref, isz
IN0045: 000128 vmovsd xmm1, qword ptr [V14 rbp-0x48]
IN0046: 00012D vmovsd xmm4, qword ptr [V12 rbp-0x40]
IN0047: 000132 jmp SHORT G_M61269_IG10 So there are two xmm spill/reloads in the inner loop now. Root cause for this is slightly more aggressive copy prop, likely the result of phi refinement
and this likely creates a conflict that LSRA is unable to resolve without a spill. Not clear there is any good fix here. Seems like the previous behavior where there were temps in the loops gave LSRA natural split points that are now gone. The regressions are all fairly small so I am just going to close this. |
Run Information
Regressions in SIMD.ConsoleMandel
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
SIMD.ConsoleMandel.ScalarDoubleSinglethreadADT
ETL Files
Histogram
JIT Disasms
SIMD.ConsoleMandel.ScalarFloatSinglethreadADT
ETL Files
Histogram
JIT Disasms
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>.HeapSort(Size: 1000)
ETL Files
Histogram
JIT Disasms
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Run Information
Regressions in System.Numerics.Tests.Perf_BigInteger
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
System.Numerics.Tests.Perf_BigInteger.GreatestCommonDivisor(arguments: 1024,1024 bits)
ETL Files
Histogram
JIT Disasms
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
The text was updated successfully, but these errors were encountered: