-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Segfaults when building Julia from scratch with GCC 15 #58466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you bisect? |
Is this the same as #58448? |
Probably the same as #58448. Since we both use an Arch-based system, it's probably a non-Julia issue, but I'll keep this issue open until I can verify. |
For what is worth I can compile fine on Debian testing which is also a rolling release distro, although with larger lag compared to Arch Linux (and it entered feature freeze a few days ago). Can you try to do a debug+assert build with override JULIA_BUILD_MODE=debug
override LLVM_DEBUG=0
override LLVM_ASSERTIONS=1
override FORCE_ASSERTIONS=1 ? If you're lucky, the segfault hits an assertion. |
Sorry, I'm not familiar with these commands. Are these environmental variables? And where does the |
Create the file |
It builds successfully using the four settings above. Very weird. |
And if you remove override JULIA_BUILD_MODE=debug
override LLVM_DEBUG=0 ? That'd an optimised build (instead of a debug build), with assertions. |
Ok, I found an Arch Linux machine to test where I could reproduce the reported issue: enabling assertions, with or without a debug build, always makes the segfault go away. Similarly, a debug build, with or without assertions, always makes the segfault disappear. This is seriously a heisenbug. |
Any chance you could get an rr trace of the build? |
I can confirm that this is the same issue as #58448. I can confirm that building is successful with the options above. |
Maybe try running the built julia like so EDIT: Actually BugReporting might depend on distributed so that might be tricky to get working. |
This runs into an error and outputs only:
|
Reproduces in docker. (lldb) di --start-address $pc-50 -c 25
libjulia-codegen.so.1.13`::jl_emit_native_impl(jl_array_t *, LLVMOrcThreadSafeModuleRef, const jl_cgparams_t *, int) [inlined] llvm::SmallPtrSetImplBase::clear():
0x7f4ca890db57 <+885038>: cmpl %edx, %eax
0x7f4ca890db59 <+885040>: jb 0x7f4ca89105dc ; <+895923> [inlined] llvm::SmallPtrSetImplBase::clear() at SmallPtrSet.h:103:32
0x7f4ca890db5f <+885046>: movq -0x1e0(%rbp), %rdi
0x7f4ca890db66 <+885053>: shlq $0x3, %rdx
0x7f4ca890db6a <+885057>: movl $0xffffffff, %esi ; imm = 0xFFFFFFFF
0x7f4ca890db6f <+885062>: callq 0x7f4ca8835390 ; ___lldb_unnamed_symbol5432 + 16
0x7f4ca890db74 <+885067>: movq $0x0, -0x1d4(%rbp)
0x7f4ca890db7f <+885078>: movq -0xae8(%rbp), %rbx
0x7f4ca890db86 <+885085>: movq (%rbx), %rax
-> 0x7f4ca890db89 <+885088>: movq 0x8(%rax), %rax
0x7f4ca890db8d <+885092>: movq %rax, (%rbx)
0x7f4ca890db90 <+885095>: movq -0x1f0(%rbp), %rbx
0x7f4ca890db97 <+885102>: leaq -0x9d0(%rbp), %rax
0x7f4ca890db9e <+885109>: movq $0x0, -0x9d8(%rbp)
0x7f4ca890dba9 <+885120>: movq %rax, -0xae0(%rbp)
0x7f4ca890dbb0 <+885127>: movq %rax, -0x9e0(%rbp)
0x7f4ca890dbb7 <+885134>: testq %rbx, %rbx
0x7f4ca890dbba <+885137>: je 0x7f4ca890ff42 ; <+894233> [inlined] void llvm::SmallVectorImpl<void*>::resizeImpl<false>(unsigned long) at SmallVector.h:620:5
0x7f4ca890dbc0 <+885143>: movq -0xac0(%rbp), %rdi
0x7f4ca890dbc7 <+885150>: movq %rbx, %rsi
0x7f4ca890dbca <+885153>: callq 0x7f4ca89243c0 ; llvm::SmallVectorTemplateBase<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, false>::grow at SmallVector.h:433:6
0x7f4ca890dbcf <+885158>: movl -0x9d8(%rbp), %eax
0x7f4ca890dbd5 <+885164>: movq %rbx, %rdx
0x7f4ca890dbd8 <+885167>: shlq $0x5, %rdx
0x7f4ca890dbdc <+885171>: shlq $0x5, %rax Maybe this is the pgcstack being null, not sure if this is the correct way to inspect it? (lldb) p *(jl_gcframe_t **)jl_get_pgcstack()
(jl_gcframe_t *) nullptr EDIT: looks like rbx is the pcgstack. I guess |
Build succeeded on gcc 14 |
@fxcoudert I know you are more into gfortran which is most likely not relevant here, I'm pinging you on the off chance you may have a clue of what change in gcc 15 could caused something like #58466 (comment) (not too much information at the moment, though). |
I see this is specific to GCC 15, but is it also target-dependent? Linux-only, or aarch64-only, or aarch64-linux-only? That would go a long way toward identifying a possible cause. It sounds like it could be a miscompilation (or an optimization of invalid code), but as you said, there is little information for now. Maybe compare the ASM emitted for the problematic function, with GCC 14 vs GCC 15? |
Minimal fix to get Julia building on gcc 15 on top of dbc38d6 is to revert the change to diff --git a/src/llvm-expand-atomic-modify.cpp b/src/llvm-expand-atomic-modify.cpp
index e4152bb45f..7b7b3c8761 100644
--- a/src/llvm-expand-atomic-modify.cpp
+++ b/src/llvm-expand-atomic-modify.cpp
@@ -17,7 +17,6 @@
#include <llvm/IR/InstIterator.h>
#include <llvm/IR/Instructions.h>
#include <llvm/IR/IntrinsicInst.h>
-#include "llvm/IR/MemoryModelRelaxationAnnotations.h"
#include <llvm/IR/Module.h>
#include <llvm/IR/Operator.h>
#include <llvm/IR/PassManager.h>
@@ -142,28 +141,12 @@ std::pair<Value *, Value *> insertRMWCmpXchgLoop(
}
// from AtomicExpandImpl
-// IRBuilder to be used for replacement atomic instructions.
-struct ReplacementIRBuilder
- : IRBuilder<InstSimplifyFolder, IRBuilderCallbackInserter> {
- MDNode *MMRAMD = nullptr;
-
+struct ReplacementIRBuilder : IRBuilder<InstSimplifyFolder> {
// Preserves the DebugLoc from I, and preserves still valid metadata.
- // Enable StrictFP builder mode when appropriate.
explicit ReplacementIRBuilder(Instruction *I, const DataLayout &DL)
- : IRBuilder(I->getContext(), InstSimplifyFolder(DL),
- IRBuilderCallbackInserter(
- [this](Instruction *I) { addMMRAMD(I); })) {
+ : IRBuilder(I->getContext(), DL) {
SetInsertPoint(I);
this->CollectMetadataToCopy(I, {LLVMContext::MD_pcsections});
- if (BB->getParent()->getAttributes().hasFnAttr(Attribute::StrictFP))
- this->setIsFPConstrained(true);
-
- MMRAMD = I->getMetadata(LLVMContext::MD_mmra);
- }
-
- void addMMRAMD(Instruction *I) {
- if (canInstructionHaveMMRAs(*I))
- I->setMetadata(LLVMContext::MD_mmra, MMRAMD);
}
};
@@ -338,6 +321,7 @@ void expandAtomicModifyToCmpXchg(CallInst &Modify,
Type *Ty = Modify.getFunctionType()->getReturnType()->getStructElementType(0);
ReplacementIRBuilder Builder(&Modify, Modify.getModule()->getDataLayout());
+ Builder.setIsFPConstrained(Modify.hasFnAttr(Attribute::StrictFP));
CallInst *ModifyOp;
{
@@ -382,7 +366,7 @@ void expandAtomicModifyToCmpXchg(CallInst &Modify,
ModifyOp = cast<CallInst>(ValOp->getUser());
LoadedOp = ValOp;
assert(LoadedOp->get() == RMW);
- RMW->moveBeforePreserving(ModifyOp->getIterator()); // NewValInst is a user of RMW, and RMW has no other dependants (per patternMatchAtomicRMWOp)
+ RMW->moveBefore(ModifyOp); // NewValInst is a user of RMW, and RMW has no other dependants (per patternMatchAtomicRMWOp)
BinOp = false;
if (++attempts > 3)
break;
@@ -399,7 +383,7 @@ void expandAtomicModifyToCmpXchg(CallInst &Modify,
assert(isa<UndefValue>(RMW->getOperand(1))); // RMW was previously being used as the placeholder for Val
Value *Val;
if (ValOp != nullptr) {
- RMW->moveBeforePreserving(cast<Instruction>(ValOp->getUser())->getIterator()); // ValOp is a user of RMW, and RMW has no other dependants (per patternMatchAtomicRMWOp)
+ RMW->moveBefore(cast<Instruction>(ValOp->getUser())); // ValOp is a user of RMW, and RMW has no other dependants (per patternMatchAtomicRMWOp)
Val = ValOp->get();
} else if (RMWOp == AtomicRMWInst::Xchg) {
Val = NewVal;
@@ -427,7 +411,7 @@ void expandAtomicModifyToCmpXchg(CallInst &Modify,
Builder, Ty, Ptr, *Alignment, Ordering, SSID, Modify,
[&](IRBuilderBase &Builder, Value *Loaded) JL_NOTSAFEPOINT {
LoadedOp->set(Loaded);
- ModifyOp->moveBeforePreserving(*Builder.GetInsertBlock(), Builder.GetInsertPoint());
+ ModifyOp->moveBefore(*Builder.GetInsertBlock(), Builder.GetInsertPoint());
return ModifyOp;
},
CreateWeakCmpXchg); |
The difference in the jl_emit_native_impl seems to be solely due to the difference in llvm 19 and 20 headers when compiling aotcompile.o. Interestingly, with llvm 19 %rsp is used extensively but with llvm 20 %rbp is instead used. |
Even more minimally compiling with |
I can reproduce the segfault when precompiling all the stdlibs on both x86_64-linux-gnu and aarch64-linux-gnu, so that doesn't seem to be target-dependent.
So that's the switch to C23 in GCC 15? Note that there's no |
Sure but specifying the cxx seems necessary |
So Dropping just |
Ok, that makes more sense, it's just an accident that |
I tried running creduce but gave up as it's taking way too long, problem being compiling aotcompile.cpp is just slow. EDIT: I properly recompiled with the nix gcc 15.1.0 and it's failing with the same issue now. |
Bisected to gcc-mirror/gcc@6ea25c0 which changes For my future reference, bisected with commands
And the script is #!/bin/bash
JULIA_ROOT=...
GCCSRC=...
GCCBUILDIR=...
GCCINSTALLDIR=...
rm -f $GCCINSTALLDIR/usr/local/bin/g++
mkdir -p $GCCBUILDIR
pushd $GCCBUILDIR
$GCCSRC/configure --disable-bootstrap --enable-checking=yes --disable-libsanitizer --disable-nls --disable-dependency-tracking --enable-languages=c,c++ --without-isl --disable-cet --disable-libstdcxx-pch --disable-static --disable-multilib CC="ccache gcc" CCX="ccache g++" > /dev/null
nice make -j12 CFLAGS="-O1 -g0" CXXFLAGS="-O1 -g0" > /dev/null
make install DESTDIR=$GCCINSTALLDIR > /dev/null
popd
if [ ! -f $GCCINSTALLDIR/usr/local/bin/g++ ]; then
echo "GCC installation failed, g++ not found in $GCCINSTALLDIR/usr/local/bin/"
exit 125;
fi
rm -f $JULIA_ROOT/src/aotcompile.o $JULIA_ROOT/usr/lib/libjulia-codegen.so.1.13.0
nice make VERBOSE=1 -j12 -C $JULIA_ROOT/src CXX="$GCCINSTALLDIR/usr/local/bin/g++" SHIPFLAGS_COMMON="-O1" libjulia-codegen-release
DEPOT=`mktemp -d`
no_avx_precompilation_output=$(JULIA_DEPOT_PATH=$DEPOT JULIA_LOAD_PATH=@stdlib:$JULIA_ROOT/stdlib JULIA_CPU_TARGET="native" $JULIA_ROOT/usr/bin/julia --startup-file=no -e 'Base.Precompilation.precompilepkgs(["LibUV_jll"])' 2>&1)
no_avx_exit_code=$?
rm -rf $DEPOT
if [[ $no_avx_exit_code -ne 0 ]] || [[ $no_avx_precompilation_output == *"error"* ]] || [[ $no_avx_precompilation_output == *"Error"* ]]; then
echo "Precompilation of LibUV_jll failed, exiting."
exit 125
fi
echo "Precompilation of LibUV_jll succeeded without AVX."
rm $JULIA_ROOT/src/aotcompile.o $JULIA_ROOT/usr/lib/libjulia-codegen.so.1.13.0
nice make -j12 -C $JULIA_ROOT/src CXX="$GCCINSTALLDIR/usr/local/bin/g++ -mavx" libjulia-codegen-release
DEPOT=`mktemp -d`
precompile_output=$(JULIA_DEPOT_PATH=$DEPOT JULIA_LOAD_PATH=@stdlib:$JULIA_ROOT/stdlib JULIA_CPU_TARGET="native" $JULIA_ROOT/usr/bin/julia --startup-file=no -e 'Base.Precompilation.precompilepkgs(["LibUV_jll"])' 2>&1)
rm -rf $DEPOT
echo "$precompile_output"
if [[ $precompile_output == *"Segmentation fault"* ]] && [[ $precompile_output == *"jl_emit_native_impl"* ]]; then
echo "Interesting"
exit 1
fi
exit 0 |
With |
Uh oh!
There was an error while loading. Please reload this page.
This has occurred within the last few (around 4) days. Currently on commit 864aac0 (edit: has been bisected to dbc38d6).
Building Julia from scratch results in all the stdlibs erroring with an error message like below.
The issue persists after running
make cleanall
, and even after deleting and re-cloning the repo.This might be an issue with my computer, but I'm stumped as to what it could be.
Julia installed from
juliaup
runs fine, and returns thisversioninfo
Error message:
The text was updated successfully, but these errors were encountered: