-
Notifications
You must be signed in to change notification settings - Fork 5k
JIT_CountProfile32 incorrect native codegen on linux #89340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis method needs to have a (lldb) di -n JIT_CountProfile32
libcoreclr.so`JIT_CountProfile32:
libcoreclr.so[0x33f390] <+0>: push rbp
libcoreclr.so[0x33f391] <+1>: mov rbp, rsp
libcoreclr.so[0x33f394] <+4>: push rbx
libcoreclr.so[0x33f395] <+5>: push rax
libcoreclr.so[0x33f396] <+6>: mov rbx, rdi
libcoreclr.so[0x33f399] <+9>: cmp dword ptr [rdi], 0x0
libcoreclr.so[0x33f39c] <+12>: jle 0x33f3cb ; <+59> [inlined] InterlockedAdd at jithelpers.cpp:6015
libcoreclr.so[0x33f39e] <+14>: lea rdi, [rip + 0x33f193]
libcoreclr.so[0x33f3a5] <+21>: call 0x65f740 ; symbol stub for: __tls_get_addr
libcoreclr.so[0x33f3aa] <+26>: mov ecx, dword ptr [rax]
libcoreclr.so[0x33f3b0] <+32>: mov edx, ecx
libcoreclr.so[0x33f3b2] <+34>: shl edx, 0xd
libcoreclr.so[0x33f3b5] <+37>: xor edx, ecx
libcoreclr.so[0x33f3b7] <+39>: mov ecx, edx
libcoreclr.so[0x33f3b9] <+41>: shr ecx, 0x11
libcoreclr.so[0x33f3bc] <+44>: xor ecx, edx
libcoreclr.so[0x33f3be] <+46>: mov edx, ecx
libcoreclr.so[0x33f3c0] <+48>: shl edx, 0x5
libcoreclr.so[0x33f3c3] <+51>: xor edx, ecx
libcoreclr.so[0x33f3c5] <+53>: mov dword ptr [rax], edx
libcoreclr.so[0x33f3cb] <+59>: lock
libcoreclr.so[0x33f3cc] <+60>: inc dword ptr [rbx]
libcoreclr.so[0x33f3ce] <+62>: add rsp, 0x8
libcoreclr.so[0x33f3d2] <+66>: pop rbx
libcoreclr.so[0x33f3d3] <+67>: pop rbp
libcoreclr.so[0x33f3d4] <+68>: ret by way of comparison, here is the 64 bit version which does basically the same thing: libcoreclr.so`JIT_CountProfile64:
libcoreclr.so[0x33f3e0] <+0>: push rbp
libcoreclr.so[0x33f3e1] <+1>: mov rbp, rsp
libcoreclr.so[0x33f3e4] <+4>: push r14
libcoreclr.so[0x33f3e6] <+6>: push rbx
libcoreclr.so[0x33f3e7] <+7>: mov rbx, rdi
libcoreclr.so[0x33f3ea] <+10>: mov rax, qword ptr [rdi]
libcoreclr.so[0x33f3ed] <+13>: mov r14d, 0x1
libcoreclr.so[0x33f3f3] <+19>: test rax, rax
libcoreclr.so[0x33f3f6] <+22>: jle 0x33f446 ; <+102> [inlined] InterlockedAdd64 at jithelpers.cpp:6044
libcoreclr.so[0x33f3f8] <+24>: bsr rax, rax
libcoreclr.so[0x33f3fc] <+28>: cmp eax, 0xd
libcoreclr.so[0x33f3ff] <+31>: jb 0x33f446 ; <+102> [inlined] InterlockedAdd64 at jithelpers.cpp:6044
libcoreclr.so[0x33f401] <+33>: xor rax, 0x3f
libcoreclr.so[0x33f405] <+37>: mov cl, 0x33
libcoreclr.so[0x33f407] <+39>: sub cl, al
libcoreclr.so[0x33f409] <+41>: shl r14, cl
libcoreclr.so[0x33f40c] <+44>: lea rdi, [rip + 0x33f125]
libcoreclr.so[0x33f413] <+51>: call 0x65f740 ; symbol stub for: __tls_get_addr
libcoreclr.so[0x33f418] <+56>: mov ecx, dword ptr [rax]
libcoreclr.so[0x33f41e] <+62>: mov edx, ecx
libcoreclr.so[0x33f420] <+64>: shl edx, 0xd
libcoreclr.so[0x33f423] <+67>: xor edx, ecx
libcoreclr.so[0x33f425] <+69>: mov ecx, edx
libcoreclr.so[0x33f427] <+71>: shr ecx, 0x11
libcoreclr.so[0x33f42a] <+74>: xor ecx, edx
libcoreclr.so[0x33f42c] <+76>: mov edx, ecx
libcoreclr.so[0x33f42e] <+78>: shl edx, 0x5
libcoreclr.so[0x33f431] <+81>: xor edx, ecx
libcoreclr.so[0x33f433] <+83>: mov dword ptr [rax], edx
libcoreclr.so[0x33f439] <+89>: lea eax, [r14 - 0x1]
libcoreclr.so[0x33f43d] <+93>: test eax, edx
libcoreclr.so[0x33f43f] <+95>: je 0x33f446 ; <+102> [inlined] InterlockedAdd64 at jithelpers.cpp:6044
libcoreclr.so[0x33f441] <+97>: pop rbx
libcoreclr.so[0x33f442] <+98>: pop r14
libcoreclr.so[0x33f444] <+100>: pop rbp
libcoreclr.so[0x33f445] <+101>: ret
libcoreclr.so[0x33f446] <+102>: lock
libcoreclr.so[0x33f447] <+103>: add qword ptr [rbx], r14
libcoreclr.so[0x33f44a] <+106>: pop rbx
libcoreclr.so[0x33f44b] <+107>: pop r14
libcoreclr.so[0x33f44d] <+109>: pop rbp
libcoreclr.so[0x33f44e] <+110>: ret I noticed this because of some odd looking profile counts while investigating #87194 (comment). In particular for
Not clear yet what the problem is, either there is some sort of preprocessor mixup, or else a bug in clang or LLVM?
|
Source code is here: runtime/src/coreclr/vm/jithelpers.cpp Lines 6012 to 6040 in 08a6e06
It looks like there's not much else in the runtime that depends up on cc @VSadov |
Preprocessed output looks reasonable (with the exception of odd expansion of inline
unsigned char
__attribute__((visibility("default")))
BitScanReverse(
PDWORD Index,
UINT qwMask)
{
int lzcount = __builtin_clzl(qwMask);
*Index = (DWORD)(31 - lzcount);
return qwMask != 0;
}
...
void JIT_CountProfile32(volatile LONG* pCounter) { LPVOID __me; __me = 0;
{
{ }; { }; { };
;
LONG count = *pCounter;
LONG delta = 1;
if (count > 0)
{
DWORD logCount = 0;
BitScanReverse(&logCount, count);
if (logCount >= 13)
{
delta = 1 << (logCount - 12);
const unsigned Do_not_use_rand = HandleHistogramProfileRand();
const bool update = (Do_not_use_rand & (delta - 1)) == 0;
if (!update)
{
return;
}
}
}
InterlockedAdd(pCounter, delta);
}
; } |
cc @dotnet/jit-contrib |
Looks like you should rename the runtime/src/coreclr/inc/random.h Line 40 in 08a6e06
|
It seems like |
I think |
@jakobbotsch Aw, I typed too much, and you beat me to it :-) |
The existing PAL code was using `__builtin_clzl` which is intended for platforms where `long` is 64 bits. Instead use `__builtin_clz`. The GC version had a similar issue so I've changed that too. The JIT version was already using `__builtin_clz`. Fixes dotnet#89340.
The existing PAL code was using `__builtin_clzl` which is intended for platforms where `long` is 64 bits. Instead use `__builtin_clz`. The GC version had a similar issue so I've changed that too. The JIT version was already using `__builtin_clz`. Fixes #89340.
If the concern is about CastCache having dependency on some BitScanReverse peculiarity - the underlying array in the cache is never empty, so bsr will not be used with 0 |
It was not just the |
This method needs to have a
bsr
in it to work properly. Fromlibcoreclr.so
in .NET 8 Preview 6:by way of comparison, here is the 64 bit version which does basically the same thing:
I noticed this because of some odd looking profile counts while investigating #87194 (comment).
In particular for
EnumerateUsingIndexer
as run under BDN, with both interlocked (precise) and scalable counts enabled (interlocked counts are the upper value), note how the second count below is 1 instead of ~12800 or so.Not clear yet what the problem is, either there is some sort of preprocessor mixup, or else a bug in clang or LLVM?
The text was updated successfully, but these errors were encountered: