Skip to content

ld.lld crashing when linking GraphicsMagick #134843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lb90 opened this issue Apr 8, 2025 · 12 comments
Closed

ld.lld crashing when linking GraphicsMagick #134843

lb90 opened this issue Apr 8, 2025 · 12 comments
Labels
crash Prefer [crash-on-valid] or [crash-on-invalid] lld:COFF question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@lb90
Copy link

lb90 commented Apr 8, 2025

Here's the crash log:

libtool: link: cc -fopenmp -g -O2 -Wall -o utilities/.libs/gm.exe utilities/gm.o  magick/.libs/libGraphicsMagick.a -lgdi32 -luser32 -ljxl -ljxl_threads -ltiff -ljbig -lsharpyuv -lwebp -lwebpmux -lfreetype -ljpeg -lturbojpeg -lpng16 -llcms2 -llcms2_fast_float -lxml2 -lzstd -llzma -lbz2 -lz -lpthread -pthread -fopenmp
ld.lld: warning: utilities/gm.o: locally defined symbol imported: GMCommand (defined in libGraphicsMagick.a(libGraphicsMagick_la-command.o)) [LNK4217]
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Exception Code: 0xC0000005
make[1]: Leaving directory '/c/Users/gitlab_runner/AppData/Local/Temp/GraphicsMagick-1.3.43'
#0 0x00007ff71c583b2a lld::coff::Baserel& std::__1::vector<lld::coff::Baserel, std::__1::allocator<lld::coff::Baserel>>::emplace_back<unsigned int, llvm::COFF::MachineTypes const&>(unsigned int&&, llvm::COFF::MachineTypes const&) (C:\ink100\clang64\bin\ld.lld.exe+0x343b2a)
#1 0x00007ffca3171be4 llvm::parallel::detail::Latch::dec() (C:\ink100\clang64\bin\libLLVM-20.dll+0xf01be4)
#2 0x00007ffca3171a8b void std::__1::__uninitialized_allocator_relocate[abi:nn200100]<std::__1::allocator<std::__1::function<void ()>>, std::__1::function<void ()>*>(std::__1::allocator<std::__1::function<void ()>>&, std::__1::function<void ()>*, std::__1::function<void ()>*, std::__1::function<void ()>*) (C:\ink100\clang64\bin\libLLVM-20.dll+0xf01a8b)
#3 0x00007ffca31711ec std::__1::vector<std::__1::thread, std::__1::allocator<std::__1::thread>>::__append(unsigned long long) (C:\ink100\clang64\bin\libLLVM-20.dll+0xf011ec)
#4 0x00007ffca3171012 std::__1::vector<std::__1::thread, std::__1::allocator<std::__1::thread>>::__append(unsigned long long) (C:\ink100\clang64\bin\libLLVM-20.dll+0xf01012)
#5 0x00007ffcc48b6b4c (C:\Windows\System32\ucrtbase.dll+0x26b4c)
#6 0x00007ffcc4ee4cb0 (C:\Windows\System32\KERNEL32.DLL+0x14cb0)
#7 0x00007ffcc6adeceb (C:\Windows\SYSTEM32\ntdll.dll+0x7eceb)
cc: error: linker command failed due to signal (use -v to see invocation)

Output from ld.lld --reproduce: repro.tar.gz

See https://gitlab.com/inkscape/inkscape/-/merge_requests/7066

@llvmbot llvmbot added the lld label Apr 8, 2025
@EugeneZelenko EugeneZelenko added lld:COFF crash Prefer [crash-on-valid] or [crash-on-invalid] and removed lld labels Apr 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 8, 2025

@llvm/issue-subscribers-lld-coff

Author: None (lb90)

Here's the crash log:
libtool: link: cc -fopenmp -g -O2 -Wall -o utilities/.libs/gm.exe utilities/gm.o  magick/.libs/libGraphicsMagick.a -lgdi32 -luser32 -ljxl -ljxl_threads -ltiff -ljbig -lsharpyuv -lwebp -lwebpmux -lfreetype -ljpeg -lturbojpeg -lpng16 -llcms2 -llcms2_fast_float -lxml2 -lzstd -llzma -lbz2 -lz -lpthread -pthread -fopenmp
ld.lld: warning: utilities/gm.o: locally defined symbol imported: GMCommand (defined in libGraphicsMagick.a(libGraphicsMagick_la-command.o)) [LNK4217]
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Exception Code: 0xC0000005
make[1]: Leaving directory '/c/Users/gitlab_runner/AppData/Local/Temp/GraphicsMagick-1.3.43'
#<!-- -->0 0x00007ff71c583b2a lld::coff::Baserel&amp; std::__1::vector&lt;lld::coff::Baserel, std::__1::allocator&lt;lld::coff::Baserel&gt;&gt;::emplace_back&lt;unsigned int, llvm::COFF::MachineTypes const&amp;&gt;(unsigned int&amp;&amp;, llvm::COFF::MachineTypes const&amp;) (C:\ink100\clang64\bin\ld.lld.exe+0x343b2a)
#<!-- -->1 0x00007ffca3171be4 llvm::parallel::detail::Latch::dec() (C:\ink100\clang64\bin\libLLVM-20.dll+0xf01be4)
#<!-- -->2 0x00007ffca3171a8b void std::__1::__uninitialized_allocator_relocate[abi:nn200100]&lt;std::__1::allocator&lt;std::__1::function&lt;void ()&gt;&gt;, std::__1::function&lt;void ()&gt;*&gt;(std::__1::allocator&lt;std::__1::function&lt;void ()&gt;&gt;&amp;, std::__1::function&lt;void ()&gt;*, std::__1::function&lt;void ()&gt;*, std::__1::function&lt;void ()&gt;*) (C:\ink100\clang64\bin\libLLVM-20.dll+0xf01a8b)
#<!-- -->3 0x00007ffca31711ec std::__1::vector&lt;std::__1::thread, std::__1::allocator&lt;std::__1::thread&gt;&gt;::__append(unsigned long long) (C:\ink100\clang64\bin\libLLVM-20.dll+0xf011ec)
#<!-- -->4 0x00007ffca3171012 std::__1::vector&lt;std::__1::thread, std::__1::allocator&lt;std::__1::thread&gt;&gt;::__append(unsigned long long) (C:\ink100\clang64\bin\libLLVM-20.dll+0xf01012)
#<!-- -->5 0x00007ffcc48b6b4c (C:\Windows\System32\ucrtbase.dll+0x26b4c)
#<!-- -->6 0x00007ffcc4ee4cb0 (C:\Windows\System32\KERNEL32.DLL+0x14cb0)
#<!-- -->7 0x00007ffcc6adeceb (C:\Windows\SYSTEM32\ntdll.dll+0x7eceb)
cc: error: linker command failed due to signal (use -v to see invocation)

Output from ld.lld --reproduce: repro.tar.gz

See https://gitlab.com/inkscape/inkscape/-/merge_requests/7066

@mstorsjo
Copy link
Member

mstorsjo commented Apr 8, 2025

It looks like this is a duplicate of #131807. When trying to reproduce this issue with a very recent nightly build, I get this error instead (since #134443):

lld-link: error: undefined symbol: __declspec(dllimport) GMCommand
>>> referenced by utilities/gm.c:61
>>>               C/Users/roberta/imagemagick-clang/GraphicsMagick-1.3.43/utilities/gm.o:(main)
NOTE: a relevant symbol 'GMCommand' is available in C/Users/roberta/imagemagick-clang/GraphicsMagick-1.3.43/magick/.libs/libGraphicsMagick.a but cannot be used because it is not an import library.

@lb90
Copy link
Author

lb90 commented Apr 9, 2025

Great! :-)

Shouldn't the link succeed? That's described in this blog post by Raymond Chen: https://devblogs.microsoft.com/oldnewthing/20060726-00/?p=30363

@mstorsjo
Copy link
Member

mstorsjo commented Apr 9, 2025

Shouldn't the link succeed? That's described in this blog post by Raymond Chen: https://devblogs.microsoft.com/oldnewthing/20060726-00/?p=30363

So if we're referencing the symbol __imp_func but we only have func included in what we already are linking, then yes, the linker does synthesize a __imp_func and points it at func.

But in this case, func isn't included in what we're linking, it's in a static library and we haven't pulled in those object files from there yet (as nothing has directly referenced them), so we don't have func to produce a local import for. This is consistent for both MS link.exe and LLD. (And GNU ld doesn't support this case at all.) This was the behaviour in LLD in 19.x, and also on latest git main.

Now in #109082 we did attempt to fix this; if we don't have func yet, but we know that it is available in a static library we haven't pulled in yet, then we do that (and keep on pulling in more object files from that library, and potentially other libraries, until we already have all symbols we need). This would, seemingly, fix the issue. However if those object files that were pulled in also ended up having dllexport directives in them, then we would explode (which is this bug here), see #131807 for more details. And while discussing how to fix that, we concluded that it's probably the least hairy to just back out this change as no other linkers did that.

I kinda understand how you get there though; you build a library that diligently uses dllexport attributes while building, and diligently uses dllimport attributes in the public headers for users of the library - but they you actually link against a static library. That will get you the error (on 19.x and git main). If the static library doesn't contain dllexport attributes, it should work on the 20.x releases so far, but if it does, then it crashes instead.

Now I'm a little curious how people end up hitting this bug so much - is there a build setup where this succeeded with LLD 19.x that now no longer works?

@lb90
Copy link
Author

lb90 commented Apr 9, 2025

I kinda understand how you get there though; you build a library that diligently uses dllexport attributes while building, and diligently uses dllimport attributes in the public headers for users of the library - but they you actually link against a static library. That will get you the error (on 19.x and git main). If the static library doesn't contain dllexport attributes, it should work on the 20.x releases so far, but if it does, then it crashes instead.

Yes, the main issue is prebuilt static libraries, e,g as shipped by MSYS2:

Suppose MSYS2 provides library A, which in turn uses library B. Now, a developer builds project P1 on MSYS2 and chooses to link A statically, but B dynamically. Another developer builds project P2 on MSYS2 and chooses to link A and B statically. Now, one of the two projects gets dllimport wrong, since MSYS2 has built A for one of the two configurations (it's not clear which one). Would CLang fail to link in that case?

Now I'm a little curious how people end up hitting this bug so much - is there a build setup where this succeeded with LLD 19.x that now no longer works?

I'll try building GraphicsMagick with CLang 19

Many thanks!

@mstorsjo
Copy link
Member

mstorsjo commented Apr 9, 2025

CC @jeremyd2019 @mati865 @lazka

I kinda understand how you get there though; you build a library that diligently uses dllexport attributes while building, and diligently uses dllimport attributes in the public headers for users of the library - but they you actually link against a static library. That will get you the error (on 19.x and git main). If the static library doesn't contain dllexport attributes, it should work on the 20.x releases so far, but if it does, then it crashes instead.

Yes, the main issue is prebuilt static libraries, e,g as shipped by MSYS2:

Suppose MSYS2 provides library A, which in turn uses library B. Now, a developer builds project P1 on MSYS2 and chooses to link A statically, but B dynamically. Another developer builds project P2 on MSYS2 and chooses to link A and B statically. Now, one of the two projects gets dllimport wrong, since MSYS2 has built A for one of the two configurations (it's not clear which one). Would CLang fail to link in that case?

So, there are a couple of aspects here.

The main complication comes from the fact that libraries are provided in two forms, static and dynamic, with only one set of headers, and expecting to be able to pick either of them at link time.

Ideally, the static libraries should never contain dllexport attributes in that case. Linking in a static library in your own library, and that suddenly starts exporting APIs from the linked in library, is almost never what you want to happen.

The secondly, if you have one set of headers for use with either a static or dynamic library, it can either be made configurable in the headers whether it mark APIs with dllimport or not, depending on which way you want to link it. Or you can just skip the dllimport attributes entirely. Calling a function without a dllimport attribute, when the function is imported from another DLL always works (also on MSVC, see the link you provided earlier). Referencing a data symbol from another DLL without dllimport, doesn't work with MSVC, but it works with all mingw toolchains thanks to a mingw specific (somewhat quirky) feature called autoimports.

So for mingw environments, just omitting all dllimports in headers should generally work. There's a tiny bit of extra overhead on calling a dllimported function without the dllimport attribute, but it's quite miniscule. And it makes the same headers work for both static and dynamic libraries. And if the library doesn't directly provide data symbols, the same also works for MSVC style environments.

Now I'm a little curious how people end up hitting this bug so much - is there a build setup where this succeeded with LLD 19.x that now no longer works?

I'll try building GraphicsMagick with CLang 19

I would expect it to hit the same error. But the more interesting thing is how this behaves if you'd compile it with GCC/GNU ld. Within mingw environments, we try to align the behaviours between those two, so if you have one setup where you successfully can build it with GCC/GNU ld, but can't do it the same way with Clang/lld, we'd want to fix that.

@aganea
Copy link
Member

aganea commented Apr 9, 2025

Now in #109082 we did attempt to fix this; if we don't have func yet, but we know that it is available in a static library we haven't pulled in yet, then we do that (and keep on pulling in more object files from that library, and potentially other libraries, until we already have all symbols we need). This would, seemingly, fix the issue. However if those object files that were pulled in also ended up having dllexport directives in them, then we would explode (which is this bug here), see #131807 for more details. And while discussing how to fix that, we concluded that it's probably the least hairy to just back out this change as no other linkers did that.

I wanted to comment a bit more in detail about that aspect. Besides the crashes, the problem I was having with #109082 is that it makes finding symbols non-deterministic. A given set of inputs will always yield a deterministic outcome; however changing the order of inputs, or slightly changing a single input might pull completly different sections from a different set of objects/archives. This could inadvertently lead to bugs such as described in #82050, even if we'd fixed that particular instance.

In contrast, MSVC implicitly mandates a strict order for the parsing of directives, through the second paragraph in https://learn.microsoft.com/en-us/cpp/build/reference/link-input-files?view=msvc-170:

Object files on the command line are processed in the order they appear on the command line. Libraries are searched in command line order as well, with the following caveat: Symbols that are unresolved when bringing in an object file from a library are searched for in that library first, and then the following libraries from the command line and /DEFAULTLIB (Specify default library) directives, and then to any libraries at the beginning of the command line.

The above essentially says that /DEFAULTLIB flags pulled from object files' directives, including those pulled from archives, should be queued and processed in a strict, deterministic, order. Implictly this means that directives from object files might be processed in that order as well, which might also include /EXPORT. I have verified that assertion with /DEFAULTLIB flags but not with /EXPORT.

I have split #85290 into several smaller patches, which I will send PRs for soon, and one of those patches moves parsing of directives a bit earlier in the process, to accomodate for the /DEFAULTLIB order. I think while we do that, we could (later) consider also making "lazy" GC roots out of the /EXPORT flags from directives, to allow them to resolve if a dllimport symbol is searched for -- what @glandium was attempting to fix. I agree overall that "dllimporting" symbols from youself should work, with a warning like LNK4217, which we already emit if the object was already pulled from the archive (however the problem here is that /EXPORT symbols are not lazily pulled, and MSVC link.exe does the same thing, not pulling on /EXPORT symbols if the object wasn't already pulled through other symbols)

@lb90
Copy link
Author

lb90 commented Apr 9, 2025

Hi @mstorsjo,

I would expect it to hit the same error. But the more interesting thing is how this behaves if you'd compile it with GCC/GNU ld. Within mingw environments, we try to align the behaviours between those two, so if you have one setup where you successfully can build it with GCC/GNU ld, but can't do it the same way with Clang/lld, we'd want to fix that.

I confirm that the same project builds successfully with GCC /GNU ld

Here are the build steps: https://gitlab.com/inkscape/inkscape/-/blob/master/buildtools/msys2installdeps.sh#L149-152

@mstorsjo
Copy link
Member

mstorsjo commented Apr 9, 2025

I would expect it to hit the same error. But the more interesting thing is how this behaves if you'd compile it with GCC/GNU ld. Within mingw environments, we try to align the behaviours between those two, so if you have one setup where you successfully can build it with GCC/GNU ld, but can't do it the same way with Clang/lld, we'd want to fix that.

I confirm that the same project builds successfully with GCC /GNU ld

Here are the build steps: https://gitlab.com/inkscape/inkscape/-/blob/master/buildtools/msys2installdeps.sh#L149-152

Ok, thanks for checking! And thanks for providing repro instructions. I'll see if I have time to try it out for myself in a couple of days hopefully

For continuing the diving in here; in the link repro example, you had utilities/gm.o which contained the following symbols:

$ llvm-nm utilities/gm.o
00000000 a @feat.00
         U __imp_GMCommand
         U __main
00000000 T main

And this linked against magick/.libs/libGraphicsMagick.a which was a static library which only provides GMCommand.

In the successful build with GCC and GNU ld, does utilities/gm.o still reference __imp_GMCommand or does it reference GMCommand? Or does it link against a dynamically linked libGraphicsMagic.dll.a rather than a static one?

@lb90
Copy link
Author

lb90 commented Apr 10, 2025

Here's the output with binutils nm:

lucab@DESKTOP-LOA8DP2 UCRT64 /d/graphicsmagick-ucrt/GraphicsMagick-1.3.43
$ nm utilities/gm.o
0000000000000000 b .bss
0000000000000000 d .data
0000000000000000 N .debug_abbrev
0000000000000000 N .debug_aranges
0000000000000000 N .debug_frame
0000000000000000 N .debug_info
0000000000000000 N .debug_line
0000000000000000 N .debug_line_str
0000000000000000 N .debug_loclists
0000000000000000 N .debug_rnglists
0000000000000000 p .pdata.startup
0000000000000000 r .rdata$zzz
0000000000000000 t .text
0000000000000000 t .text.startup
0000000000000000 r .xdata.startup
                 U __imp_GMCommand
                 U __main
0000000000000000 T main
lucab@DESKTOP-LOA8DP2 UCRT64 /d/graphicsmagick-ucrt/GraphicsMagick-1.3.43
$ nm magick/.libs/libGraphicsMagick.a | grep GMCommand
00000000000207b0 T GMCommand
0000000000007b40 t GMCommandSingle

Or does it link against a dynamically linked libGraphicsMagic.dll.a rather than a static one?

Good question! There are two gm.exe files: one in utilities/ and one in utlities/.libs

lucab@DESKTOP-LOA8DP2 UCRT64 /d/graphicsmagick-ucrt/GraphicsMagick-1.3.43
$ ls -a utilities/
.  ..  .deps  .dirstamp  .libs  Makefile.am  gm.1  gm.c  gm.exe  gm.o  miff.4  quantize.5  tests

lucab@DESKTOP-LOA8DP2 UCRT64 /d/graphicsmagick-ucrt/GraphicsMagick-1.3.43
$ ls -a utilities/.libs/
.  ..  gm.exe  gm_ltshwrapper  lt-gm.c

Here are their dependencies:

  • utilities/gm.exe
    D:\graphicsmagick-ucrt\GraphicsMagick-1.3.43>dumpbin /DEPENDENTS utilities\gm.exe
    Microsoft (R) COFF/PE Dumper Version 14.43.34810.0
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    
    Dump of file utilities\gm.exe
    
    File Type: EXECUTABLE IMAGE
    
      Image has the following dependencies:
    
        KERNEL32.dll
        api-ms-win-crt-environment-l1-1-0.dll
        api-ms-win-crt-filesystem-l1-1-0.dll
        api-ms-win-crt-heap-l1-1-0.dll
        api-ms-win-crt-math-l1-1-0.dll
        api-ms-win-crt-private-l1-1-0.dll
        api-ms-win-crt-process-l1-1-0.dll
        api-ms-win-crt-runtime-l1-1-0.dll
        api-ms-win-crt-stdio-l1-1-0.dll
        api-ms-win-crt-string-l1-1-0.dll
    
  • utilities/.libs/gm.exe
    D:\graphicsmagick-ucrt\GraphicsMagick-1.3.43>dumpbin /DEPENDENTS utilities\.libs\gm.exe
    Microsoft (R) COFF/PE Dumper Version 14.43.34810.0
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    
    Dump of file utilities\.libs\gm.exe
    
    File Type: EXECUTABLE IMAGE
    
      Image has the following dependencies:
    
        KERNEL32.dll
        api-ms-win-crt-environment-l1-1-0.dll
        api-ms-win-crt-heap-l1-1-0.dll
        api-ms-win-crt-math-l1-1-0.dll
        api-ms-win-crt-private-l1-1-0.dll
        api-ms-win-crt-runtime-l1-1-0.dll
        api-ms-win-crt-stdio-l1-1-0.dll
        api-ms-win-crt-string-l1-1-0.dll
        libGraphicsMagick-3.dll
    

For 2, the only import from libGraphicsMagick-3.dll is GMCommand:

D:\graphicsmagick-ucrt\GraphicsMagick-1.3.43>dumpbin /IMPORTS utilities\.libs\gm.exe
Microsoft (R) COFF/PE Dumper Version 14.43.34810.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file utilities\.libs\gm.exe

File Type: EXECUTABLE IMAGE

  Section contains the following imports:

  (...)

    libGraphicsMagick-3.dll
             1400083C8 Import Address Table
             140008240 Import Name Table
                     0 time date stamp
                     0 Index of first forwarder reference

                         138 GMCommand

@mstorsjo
Copy link
Member

Thanks for the investigation! I tried it out myself now, and now I see the issue.

Deep down, this is a libtool issue - https://debbugs.gnu.org/cgi/bugreport.cgi?bug=27866. Libtool is not very actively maintained... (That bug report was posted in 2017, but there was no active libtool maintainer for many years. A couple years ago there was some sort of acting maintainer that did respond and try to sort some things out, for a little while, but he also vanished. Now there's another maintainer, who has worked on things again a couple months ago, but this bug hasn't received attention yet.) The bug is worked around in msys2 with patches for libtool, see https://github.com/msys2/MSYS2-packages/blob/master/libtool/0011-Pick-up-clang_rt-static-archives-compiler-internal-l.patch and https://github.com/msys2/MSYS2-packages/blob/master/libtool/0013-Allow-statically-linking-compiler-support-libraries-.patch.

Due to how libtool is bundled with the source packages, users need to upgrade the libtool bundled in each source package to work around the bug. So in an msys2 shell (with the right autotools and libtool packages installed) you can do autoreconf -fiv within the tarball, before attempting to build with Clang. This issue mainly manifests in libraries that use C++, iirc.

Before fixing this, when attempting to build this package with Clang/LLD, I get these messages while building:

*** Since this library must not contain undefined symbols,
*** because either the platform does not support them or
*** it was explicitly requested with -no-undefined,
*** libtool will only create a static version of it.

So the build is set up to build a shared library, but when coming to the actual linking phase, libtool decides not to create a shared library after all, and only create a static library. That's why we have the odd inconsistency between dllimport/exports and what we actually link.

By running autoreconf -fiv in the source tree before building, I can build it successfully with ./configure --enable-shared && make -j$(nproc).

@lb90
Copy link
Author

lb90 commented Apr 10, 2025

Wow, thank you very much! That explain a lot

Best Regards!
Luca

@lb90 lb90 closed this as completed Apr 10, 2025
@EugeneZelenko EugeneZelenko added the question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! label Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash Prefer [crash-on-valid] or [crash-on-invalid] lld:COFF question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

No branches or pull requests

5 participants