-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
Currently, in lld/MachO, if multiple external weak symbols are made to point to the same address within a single object file using assembler directives (specifically .set and .weak_definition), they are not treated as independently overridable. If a strong definition is provided for any of these symbol names, all the original weak symbols end up resolving to that strong definition.
This behavior is inconsistent with the Apple linker. It also prevents a code-size optimization pattern: multiple optional symbols can alias a single, default weak implementation, and only those symbols that are actively provided by other object files should be replaced by their strong definitions.
Reproducing this issue requires dropping to assembly language, as there does not appear to be a way to create symbol aliases using Clang. In particular, any attempt to use the alias attribute from C errors out with:
error: aliases are not supported on darwin
Repro
Create the following files in a directory, then run run.sh.
// main.c
#include <stdio.h>
extern const int weak_a;
extern const int weak_b;
int main() {
printf("a=%d, b=%d\n", weak_a, weak_b);
return 0;
}# weak.s
.section __TEXT,__const
.globl _placeholder_int
.weak_definition _placeholder_int
.p2align 2, 0x0
_placeholder_int:
.long 0
.globl _weak_a
.set _weak_a, _placeholder_int
.weak_definition _weak_a
.globl _weak_b
.set _weak_b, _placeholder_int
.weak_definition _weak_b
.subsections_via_symbols# strong_a.s
.section __TEXT,__const
.globl _strong_a
_strong_a:
.long 1
.globl _weak_a
.set _weak_a, _strong_a# strong_b.s
.section __TEXT,__const
.globl _strong_b
_strong_b:
.long 2
.globl _weak_b
.set _weak_b, _strong_b# run.sh
# To set which LD is being used.
# LDFLAGS="-fuse-ld=/Users/haberman/code/llvm-project/build/bin/ld64.lld"
# LDFLAGS="-Wl,-ld_classic"
rm -f *.o test_*
set -ex
clang -c -o main.o main.c
clang -c -o weak.o weak.s
clang -c -o strong_a.o strong_a.s
clang -c -o strong_b.o strong_b.s
# Case 1: weak only
clang $LDFLAGS -o test_weak_only main.o weak.o
./test_weak_only
# case 2: strong a
clang $LDFLAGS -o test_strong_a main.o weak.o strong_a.o
./test_strong_a
# case 3: strong b
clang $LDFLAGS -o test_strong_b main.o weak.o strong_b.o
./test_strong_b
# case 4: strong a and b
clang $LDFLAGS -o test_strong_ab main.o weak.o strong_a.o strong_b.o
./test_strong_abTest Results
| Test Case | ld64 (-Wl,-ld_classic) |
ld64 | ld64.lld (current) | ld64.lld (proposed) |
|---|---|---|---|---|
| Case 1: Weak Only | a=0, b=0 |
a=0, b=0 |
a=0, b=0 |
a=0, b=0 |
| Case 2: Strong A | a=1, b=0 |
a=1, b=0 |
a=1, b=1 |
a=1, b=0 |
| Case 3: Strong B | a=0, b=2 |
a=2, b=2 * |
a=2, b=2 |
a=0, b=2 |
| Case 4: Strong A & B | a=1, b=2 |
a=1, b=2 |
a=2, b=2 |
a=1, b=2 |
The proposed behavior for ld64.lld is the most consistent and useful, adhering to the principle that distinct external symbol names should be resolvable independently.
*: The new Apple Linker ("ld-prime") has almost the same behavior as the classic ld64, but inexplicably differs in case 3. This seems like a bug; I can't think of any principled reason for the diverging behavior.
Analysis and Proposed Fix
The technical cause appears to be in SymbolTable::addDefined calling transplantSymbolsAtOffset, which moves all symbols at the same offset, including other external symbols.
A simple fix would be to exclude external symbols, so that only local symbols are moved.