Skip to content

lld-link Crash #131807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jlsantiago0 opened this issue Mar 18, 2025 · 11 comments · Fixed by #134443
Closed

lld-link Crash #131807

jlsantiago0 opened this issue Mar 18, 2025 · 11 comments · Fixed by #134443
Labels
crash Prefer [crash-on-valid] or [crash-on-invalid] lld:COFF

Comments

@jlsantiago0
Copy link

jlsantiago0 commented Mar 18, 2025

Using LLVM-20.1.0 release build from https://github.com/llvm/llvm-project/releases/download/llvmorg-20.1.0/LLVM-20.1.0-Linux-X64.tar.xz

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
 #0 0x0000575a1a2bb007 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/mnt/centshare/vfurnace/git/toolchain/ort-toolchain-msvc-clang/msvc-clang/LLVM-20.1.0-Linux-X64/bin/lld-link+0x4b8e007)
 #1 0x0000575a1a2bb63f SignalHandler(int) Signals.cpp:0:0
 #2 0x00007d867d550cd0 (/usr/lib/libc.so.6+0x3dcd0)
 #3 0x0000575a1a3c7aee lld::coff::(anonymous namespace)::AddressTableChunk::writeTo(unsigned char*) const DLL.cpp:0:0
 #4 0x0000575a1a31dc72 std::__1::__function::__func<llvm::parallelFor(unsigned long, unsigned long, llvm::function_ref<void (unsigned long)>)::$_0, std::__1::allocator<llvm::parallelFor(unsigned long, unsigned long, llvm::function_ref<void (unsigned long)>)::$_0>, void ()>::operator()() (.llvm.17606889172608314556) Parallel.cpp:0:0
 #5 0x0000575a1a31db9a std::__1::__function::__func<llvm::parallel::TaskGroup::spawn(std::__1::function<void ()>)::$_0, std::__1::allocator<llvm::parallel::TaskGroup::spawn(std::__1::function<void ()>)::$_0>, void ()>::operator()() Parallel.cpp:0:0
 #6 0x0000575a1a31d8d7 llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::work(llvm::ThreadPoolStrategy, unsigned int) Parallel.cpp:0:0
 #7 0x0000575a1a31d969 void* std::__1::__thread_proxy[abi:nn200100]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::'lambda'()::operator()() const::'lambda'()>>(void*) Parallel.cpp:0:0
 #8 0x00007d867d5a870a (/usr/lib/libc.so.6+0x9570a)
 #9 0x00007d867d62caac (/usr/lib/libc.so.6+0x119aac)
clang++: error: unable to execute command: Segmentation fault (core dumped)
clang++: error: linker command failed due to signal (use -v to see invocation)
clang++: error: invalid linker name in argument '-fuse-ld=lld-link'

@EugeneZelenko EugeneZelenko added lld:COFF crash Prefer [crash-on-valid] or [crash-on-invalid] incomplete Issue not complete (e.g. missing a reproducer, build arguments, etc.) and removed new issue labels Mar 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 18, 2025

@llvm/issue-subscribers-lld-coff

Author: Jose Santiago (jlsantiago0)

Using LLVM-20.1.0 release build from https://github.com/llvm/llvm-project/releases/download/llvmorg-20.1.0/LLVM-20.1.0-Linux-X64.tar.xz
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
 #<!-- -->0 0x0000575a1a2bb007 llvm::sys::PrintStackTrace(llvm::raw_ostream&amp;, int) (/mnt/centshare/vfurnace/git/toolchain/ort-toolchain-msvc-clang/msvc-clang/LLVM-20.1.0-Linux-X64/bin/lld-link+0x4b8e007)
 #<!-- -->1 0x0000575a1a2bb63f SignalHandler(int) Signals.cpp:0:0
 #<!-- -->2 0x00007d867d550cd0 (/usr/lib/libc.so.6+0x3dcd0)
 #<!-- -->3 0x0000575a1a3c7aee lld::coff::(anonymous namespace)::AddressTableChunk::writeTo(unsigned char*) const DLL.cpp:0:0
 #<!-- -->4 0x0000575a1a31dc72 std::__1::__function::__func&lt;llvm::parallelFor(unsigned long, unsigned long, llvm::function_ref&lt;void (unsigned long)&gt;)::$_0, std::__1::allocator&lt;llvm::parallelFor(unsigned long, unsigned long, llvm::function_ref&lt;void (unsigned long)&gt;)::$_0&gt;, void ()&gt;::operator()() (.llvm.17606889172608314556) Parallel.cpp:0:0
 #<!-- -->5 0x0000575a1a31db9a std::__1::__function::__func&lt;llvm::parallel::TaskGroup::spawn(std::__1::function&lt;void ()&gt;)::$_0, std::__1::allocator&lt;llvm::parallel::TaskGroup::spawn(std::__1::function&lt;void ()&gt;)::$_0&gt;, void ()&gt;::operator()() Parallel.cpp:0:0
 #<!-- -->6 0x0000575a1a31d8d7 llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::work(llvm::ThreadPoolStrategy, unsigned int) Parallel.cpp:0:0
 #<!-- -->7 0x0000575a1a31d969 void* std::__1::__thread_proxy[abi:nn200100]&lt;std::__1::tuple&lt;std::__1::unique_ptr&lt;std::__1::__thread_struct, std::__1::default_delete&lt;std::__1::__thread_struct&gt;&gt;, llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::'lambda'()::operator()() const::'lambda'()&gt;&gt;(void*) Parallel.cpp:0:0
 #<!-- -->8 0x00007d867d5a870a (/usr/lib/libc.so.6+0x9570a)
 #<!-- -->9 0x00007d867d62caac (/usr/lib/libc.so.6+0x119aac)
clang++: error: unable to execute command: Segmentation fault (core dumped)
clang++: error: linker command failed due to signal (use -v to see invocation)
clang++: error: invalid linker name in argument '-fuse-ld=lld-link'

@EugeneZelenko
Copy link
Contributor

Could you please provide reproducer?

@mstorsjo
Copy link
Member

If you can pass an option like -reproduce:dump.tar to lld-link, it'll produce a tar file containing all the link inputs, allowing us to reproduce the exact situation. If using the ld.lld frontend, pass -Wl,--reproduce=dump.tar when linking via $CC.

@jlsantiago0
Copy link
Author

Oh. So i figured it out:

dump.zip

I couldnt upload a tar file so i zipped it.

@EugeneZelenko EugeneZelenko removed the incomplete Issue not complete (e.g. missing a reproducer, build arguments, etc.) label Mar 18, 2025
@mstorsjo
Copy link
Member

I've managed to reproduce the crash, and reduce it to a small selfcontained testcase.

  • entry.c
void __declspec(dllimport) other(void);
void entry(void) {
    other();
}
  • other.c
void other(void) {
}

void __declspec(dllexport) exportedFunc(void) {
}
$ clang -target x86_64-windows-msvc -c entry.c
$ clang -target x86_64-windows-msvc -c other.c
$ llvm-ar rcs other.lib other.o
$ lld-link entry.o -entry:entry -subsystem:console other.lib

The problem is that we handle any -export arguments or directives here: https://github.com/llvm/llvm-project/blob/llvmorg-20.1.1/lld/COFF/Driver.cpp#L2483-L2503

This makes sure that each -export symbol actually gets added as a GC root, requiring resolving the symbol etc.

However in this case, we only end up pulling the other object file later, through resolveRemainingUndefines() here: https://github.com/llvm/llvm-project/blob/llvmorg-20.1.1/lld/COFF/Driver.cpp#L2641-L2645 https://github.com/llvm/llvm-project/blob/llvmorg-20.1.1/lld/COFF/SymbolTable.cpp#L455-L496

In this case, the undefined __imp_other symbol forces pulling in the symbol other from the static library. And this brings in more object files that add new -export directives, which then don't get the treatment above for loading the symbol for the exports, which then crashes at the writer stage, when the Export::sym field is null.

This obviously isn't great. There's lots of ugly and kludgey ways to work around it, I'm wondering what is the best one (or least ugly). CC @rnk @aganea @nico @cjacek who are familiar with these bits of the linker.

(To be clear, I'm not sure I have time to invest in trying to fix this right now, so feel totally free to pick it up.)

@jlsantiago0 As a way to avoid the issue, I would recommend that your static libraries wouldn't contain dllexport attributes, and that code referencing them don't expect them to be dllimported. In this case, your libzmq.lib(libzmq_la-zmq_utils.o) references __imp_crypto_box_keypair, while sodium.lib(libsodium_la-crypto_box.o) contains crypto_box_keypair, and other object files within sodium.lib contain -EXPORT: directives. Either omitting the -EXPORT: directives there, or making zmq reference sodium without the dllimport, should avoid hitting this bug.

@aganea
Copy link
Member

aganea commented Apr 3, 2025

It seems MSVC link.exe bails out with LNK2019 before even reaching the writing stage, we should probably do the same in LLD.

> link entry.obj -entry:entry -subsystem:console other.lib
...
entry.obj : error LNK2019: symbole externe non résolu __imp_other référencé dans la fonction entry

We seem to emit LNK4217 in LLD with the above test case, before crashing:

> lld-link entry.obj -entry:entry -subsystem:console other.lib
...
lld-link: warning: entry.obj: locally defined symbol imported: other (defined in other.lib(other.obj)) [LNK4217]

Both LLD and MSVC emit LNK4217 if linking the object directly, no crash this time in LLD:

> lld-link entry.obj -entry:entry -subsystem:console other.obj
...
lld-link: warning: entry.obj: locally defined symbol imported: other (defined in other.obj) [LNK4217]

@mstorsjo
Copy link
Member

mstorsjo commented Apr 4, 2025

It seems MSVC link.exe bails out with LNK2019 before even reaching the writing stage, we should probably do the same in LLD.

That sounds reasonable - that would allow keeping the current logic of doing the importing of locally defined symbols in a late stage like we do now.

> link entry.obj -entry:entry -subsystem:console other.lib
...
entry.obj : error LNK2019: symbole externe non résolu __imp_other référencé dans la fonction entry

For clarity, this error message in English is:

entry.o : error LNK2019: unresolved external symbol __imp_other referenced in function entry

But it does seem like MS link.exe is more strict than we are here - it looks like it doesn't pull in symbols from static libraries at all, in order to import local symbols, while we have been adding more of that behaviour (in 6a1bdd9 / #109082 - CC @glandium).

@aganea
Copy link
Member

aganea commented Apr 4, 2025

But it does seem like MS link.exe is more strict than we are here - it looks like it doesn't pull in symbols from static libraries at all, in order to import local symbols, while we have been adding more of that behaviour (in 6a1bdd9 / #109082 - CC @glandium).

Indeed, I think this part does not match what MSVC is doing: https://github.com/llvm/llvm-project/blob/main/lld/COFF/SymbolTable.cpp#L484-L488

Perhaps the initial problem that @glandium had was caused by our current the order of extraction of symbols inside .LIBs, which does not match was MSVC is doing, as fixed by #85290 (but I am speculating here)

Indeed the test case in https://github.com/llvm/llvm-project/blob/main/lld/test/COFF/undefined_lazy.test fails with latest MSVC link.exe.

With LLD:

> lld-link C:\src\git\llvm-project\stage1_msvc_debug\tools\lld\test\COFF\Output\undefined_lazy.test.tmp.foo.lib C:\src\git\llvm-project\stage1_msvc_debug\tools\lld\test\COFF\Output\undefined_lazy.test.tmp.bar.lib C:\src\git\llvm-project\stage1_msvc_debug\tools\lld\test\COFF\Output\undefined_lazy.test.tmp.qux.obj -out:C:\src\git\llvm-project\stage1_msvc_debug\tools\lld\test\COFF\Output\undefined_lazy.test.tmp.dll -dll
lld-link: warning: C:\src\git\llvm-project\stage1_msvc_debug\tools\lld\test\COFF\Output\undefined_lazy.test.tmp.qux.obj: locally defined symbol imported: foo (defined in undefined_lazy.test.tmp.foo.lib(undefined_lazy.test.tmp.foo.obj)) [LNK4217]

With MSVC link.exe:

> link C:\src\git\llvm-project\stage1_msvc_debug\tools\lld\test\COFF\Output\undefined_lazy.test.tmp.foo.lib C:\src\git\llvm-project\stage1_msvc_debug\tools\lld\test\COFF\Output\undefined_lazy.test.tmp.bar.lib C:\src\git\llvm-project\stage1_msvc_debug\tools\lld\test\COFF\Output\undefined_lazy.test.tmp.qux.obj -out:C:\src\git\llvm-project\stage1_msvc_debug\tools\lld\test\COFF\Output\undefined_lazy.test.tmp.dll -dll
Microsoft (R) Incremental Linker Version 14.43.34809.0
Copyright (C) Microsoft Corporation.  All rights reserved.

undefined_lazy.test.tmp.qux.obj : error LNK2001: symbole externe non résolu __imp_foo
C:\src\git\llvm-project\stage1_msvc_debug\tools\lld\test\COFF\Output\undefined_lazy.test.tmp.dll : fatal error LNK1120: 1 externes non résolus

@aganea
Copy link
Member

aganea commented Apr 4, 2025

Which makes me think I'd be nice to optionally have a way to validate that at least %ERRORCODE% returned by LLD for our testing suite testcases matches what MSVC link.exe returns, for each test.

@mstorsjo
Copy link
Member

mstorsjo commented Apr 4, 2025

Which makes me think I'd be nice to optionally have a way to validate that at least %ERRORCODE% returned by LLD for our testing suite testcases matches what MSVC link.exe returns, for each test.

I think we would be differing on quite a lot of cases; we have a huge amount of cases where we support things that MS link.exe doesn't.

aganea added a commit that referenced this issue Apr 7, 2025
This reverts commit 6a1bdd9 and re-instate behavior that matches what
MSVC link.exe does, that is, error out when trying to dllimport a symbol
from a static library.

A hint is now displayed in stdout, mentioning that we should rather dllimport the symbol
from a import library.

Fixes #131807
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue Apr 7, 2025
This reverts commit 6a1bdd9 and re-instate behavior that matches what
MSVC link.exe does, that is, error out when trying to dllimport a symbol
from a static library.

A hint is now displayed in stdout, mentioning that we should rather dllimport the symbol
from a import library.

Fixes llvm/llvm-project#131807
@mstorsjo
Copy link
Member

FYI, there was another report of a similar issue, and there we found the root cause for why there were dllimport/dllexport attributes when linking static libraries - see #134843 (comment). TL;DR, known libtool issue, you may need to update the bundled libtool in source packages with one provided by msys2 which contains the necessary patches to fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash Prefer [crash-on-valid] or [crash-on-invalid] lld:COFF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants