Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLD][COFF] Add basic ARM64X dynamic relocations support #118035

Merged
merged 1 commit into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions lld/COFF/COFFLinkerContext.h
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,8 @@ class COFFLinkerContext : public CommonLinkerContext {
Timer diskCommitTimer;

Configuration config;

DynamicRelocsChunk *dynamicRelocs = nullptr;
};

} // namespace lld::coff
Expand Down
78 changes: 78 additions & 0 deletions lld/COFF/Chunks.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@

using namespace llvm;
using namespace llvm::object;
using namespace llvm::support;
using namespace llvm::support::endian;
using namespace llvm::COFF;
using llvm::support::ulittle32_t;
Expand Down Expand Up @@ -1147,4 +1148,81 @@ uint32_t ImportThunkChunkARM64EC::extendRanges() {
return sizeof(arm64Thunk) - sizeof(uint32_t);
}

size_t Arm64XDynamicRelocEntry::getSize() const {
switch (type) {
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE:
return sizeof(uint16_t) + size; // A header and a payload.
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_DELTA:
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_ZEROFILL:
llvm_unreachable("unsupported type");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes warnings when building with GCC:

../../lld/COFF/Chunks.cpp: In member function ‘size_t lld::coff::Arm64XDynamicRelocEntry::getSize() const’:
../../lld/COFF/Chunks.cpp:1177:1: warning: control reaches end of non-void function [-Wreturn-type]
 1177 | }
      | ^

This case is mentioned in the LLVM coding style guide, see https://llvm.org/docs/CodingStandards.html#don-t-use-default-labels-in-fully-covered-switches-over-enumerations - the default action to silence this warning with GCC is to add an llvm_unreachable("") after the switch; see e.g. b21de9b. (In this case, it feels a bit repetetive when some cases already end with llvm_unreachable() too, but they're different cases.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #122382 for a fix, sorry about that. The current llvm_unreachable() will be removed one ARM64X support is more complete. Those two reloc types are needed for the shared IAT.


void Arm64XDynamicRelocEntry::writeTo(uint8_t *buf) const {
auto out = reinterpret_cast<ulittle16_t *>(buf);
*out = (offset & 0xfff) | (type << 12);

switch (type) {
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE:
*out |= ((bit_width(size) - 1) << 14); // Encode the size.
switch (size) {
case 2:
out[1] = value;
break;
case 4:
*reinterpret_cast<ulittle32_t *>(out + 1) = value;
break;
case 8:
*reinterpret_cast<ulittle64_t *>(out + 1) = value;
break;
default:
llvm_unreachable("invalid size");
}
break;
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_DELTA:
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_ZEROFILL:
llvm_unreachable("unsupported type");
}
}

void DynamicRelocsChunk::finalize() {
llvm::stable_sort(arm64xRelocs, [=](const Arm64XDynamicRelocEntry &a,
const Arm64XDynamicRelocEntry &b) {
return a.offset < b.offset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This finalize() method gets called multiple times, if we need to add range extension thunks - while I don't think anything at least at this point within the function really needs to be recalculated after the layout gets redone. But perhaps some future dynamic relocations may need to be re-sorted? (Technically, if we're inserting thunks and have relocations based on symbols, the relative ordering of relocs probably still is the same, but we may need to recalculate offsets. At that point, it's probably fine to keep this in the part of the code that gets rerun, for clarity.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sorting itself won’t change, but the size might once we allow offsets to chunks. Adding thunks could shift offsets between pages, which might add or remove coff_base_reloc_block_headers. While we could avoid re-sorting in such cases, keeping it together felt cleaner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds reasonable - no need to move the sorting out in that case; keeping it together probably makes more sense indeed than to microoptimize such things at this stage.

});

size = sizeof(coff_dynamic_reloc_table) + sizeof(coff_dynamic_relocation64) +
sizeof(coff_base_reloc_block_header);

for (const Arm64XDynamicRelocEntry &entry : arm64xRelocs) {
assert(!(entry.offset & ~0xfff)); // Not yet supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So essentially, at the point of this patch, we can only do dynamic relocations within the first 4 KB of the linked image, i.e. in the basics of the image headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding support for other offsets is straightforward, but it would remain dead code until we actually emit them. Since it’s only a few lines of code, I can add it if you think it’s better. In general, testing is tricky without separate namespace support, and namespace support depends on ARM64X relocs. So, I’m trying to find a way to keep things incremental.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good, then I'm understanding it correctly. No need to add support for other offsets yet if it's not easy to test.

size += entry.getSize();
}

size = alignTo(size, sizeof(uint32_t));
}

void DynamicRelocsChunk::writeTo(uint8_t *buf) const {
auto table = reinterpret_cast<coff_dynamic_reloc_table *>(buf);
table->Version = 1;
table->Size = sizeof(coff_dynamic_relocation64);
buf += sizeof(*table);

auto header = reinterpret_cast<coff_dynamic_relocation64 *>(buf);
header->Symbol = IMAGE_DYNAMIC_RELOCATION_ARM64X;
buf += sizeof(*header);

auto pageHeader = reinterpret_cast<coff_base_reloc_block_header *>(buf);
pageHeader->BlockSize = sizeof(*pageHeader);
for (const Arm64XDynamicRelocEntry &entry : arm64xRelocs) {
entry.writeTo(buf + pageHeader->BlockSize);
pageHeader->BlockSize += entry.getSize();
}
pageHeader->BlockSize = alignTo(pageHeader->BlockSize, sizeof(uint32_t));

header->BaseRelocSize = pageHeader->BlockSize;
table->Size += header->BaseRelocSize;
assert(size == sizeof(*table) + sizeof(*header) + header->BaseRelocSize);
}

} // namespace lld::coff
36 changes: 36 additions & 0 deletions lld/COFF/Chunks.h
Original file line number Diff line number Diff line change
Expand Up @@ -835,6 +835,42 @@ class ECExportThunkChunk : public NonSectionCodeChunk {
Defined *target;
};

// ARM64X entry for dynamic relocations.
class Arm64XDynamicRelocEntry {
public:
Arm64XDynamicRelocEntry(llvm::COFF::Arm64XFixupType type, uint8_t size,
uint32_t offset, uint64_t value)
: offset(offset), value(value), type(type), size(size) {}

size_t getSize() const;
void writeTo(uint8_t *buf) const;

uint32_t offset;
uint64_t value;

private:
llvm::COFF::Arm64XFixupType type;
uint8_t size;
};

// Dynamic relocation chunk containing ARM64X relocations for the hybrid image.
class DynamicRelocsChunk : public NonSectionChunk {
public:
DynamicRelocsChunk() {}
size_t getSize() const override { return size; }
void writeTo(uint8_t *buf) const override;
void finalize();

void add(llvm::COFF::Arm64XFixupType type, uint8_t size, uint32_t offset,
uint64_t value) {
arm64xRelocs.emplace_back(type, size, offset, value);
}

private:
std::vector<Arm64XDynamicRelocEntry> arm64xRelocs;
size_t size;
};

// MinGW specific, for the "automatic import of variables from DLLs" feature.
// This provides the table of runtime pseudo relocations, for variable
// references that turned out to need to be imported from a DLL even though
Expand Down
64 changes: 60 additions & 4 deletions lld/COFF/Writer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,11 @@ static_assert(sizeof(dosProgram) % 8 == 0,

static const int dosStubSize = sizeof(dos_header) + sizeof(dosProgram);
static_assert(dosStubSize % 8 == 0, "DOSStub size must be multiple of 8");
static const uint32_t coffHeaderOffset = dosStubSize + sizeof(PEMagic);
static const uint32_t peHeaderOffset =
coffHeaderOffset + sizeof(coff_file_header);
static const uint32_t dataDirOffset64 =
peHeaderOffset + sizeof(pe32plus_header);

static const int numberOfDataDirectory = 16;

Expand Down Expand Up @@ -272,6 +277,7 @@ class Writer {
OutputSection *findSection(StringRef name);
void addBaserels();
void addBaserelBlocks(std::vector<Baserel> &v);
void createDynamicRelocs();

uint32_t getSizeOfInitializedData();

Expand Down Expand Up @@ -754,6 +760,8 @@ void Writer::run() {
llvm::TimeTraceScope timeScope("Write PE");
ScopedTimer t1(ctx.codeLayoutTimer);

if (ctx.config.machine == ARM64X)
ctx.dynamicRelocs = make<DynamicRelocsChunk>();
createImportTables();
createSections();
appendImportThunks();
Expand All @@ -764,6 +772,7 @@ void Writer::run() {
mergeSections();
sortECChunks();
appendECImportTables();
createDynamicRelocs();
removeUnusedSections();
finalizeAddresses();
removeEmptySections();
Expand Down Expand Up @@ -1597,8 +1606,14 @@ void Writer::assignAddresses() {

for (OutputSection *sec : ctx.outputSections) {
llvm::TimeTraceScope timeScope("Section: ", sec->name);
if (sec == relocSec)
if (sec == relocSec) {
sec->chunks.clear();
addBaserels();
if (ctx.dynamicRelocs) {
ctx.dynamicRelocs->finalize();
relocSec->addChunk(ctx.dynamicRelocs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added code here, ->finalize() and ->addChunk, how does this behave if assignAddresses() gets called multiple times? If we need to add range extension thunks, we'll end up calling this method multiple times.

Ok, I see that addBaserels() starts out with relocSec->chunks.clear(); so I guess the ->addChunk() part is fine at least.

I'm pondering if there are other ways of making this clearer to the reader - either moving the newly added code here into addBaserels(), but that's probably not entirely right either, or moving the relocSec->chunks.clear() out here, to make it more visible that first we clear the section, then we add one kind of relocations, then we add another kind of relocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that would be cleaner and more correct with respect to ctx.config.relocatable. I’ll move it.

}
}
uint64_t rawSize = 0, virtualSize = 0;
sec->header.VirtualAddress = rva;

Expand Down Expand Up @@ -1673,6 +1688,7 @@ template <typename PEHeaderTy> void Writer::writeHeader() {
buf += sizeof(PEMagic);

// Write COFF header
assert(coffHeaderOffset == buf - buffer->getBufferStart());
auto *coff = reinterpret_cast<coff_file_header *>(buf);
buf += sizeof(*coff);
switch (config->machine) {
Expand Down Expand Up @@ -1705,6 +1721,7 @@ template <typename PEHeaderTy> void Writer::writeHeader() {
sizeof(PEHeaderTy) + sizeof(data_directory) * numberOfDataDirectory;

// Write PE header
assert(peHeaderOffset == buf - buffer->getBufferStart());
auto *pe = reinterpret_cast<PEHeaderTy *>(buf);
buf += sizeof(*pe);
pe->Magic = config->is64() ? PE32Header::PE32_PLUS : PE32Header::PE32;
Expand Down Expand Up @@ -1770,6 +1787,8 @@ template <typename PEHeaderTy> void Writer::writeHeader() {
pe->SizeOfInitializedData = getSizeOfInitializedData();

// Write data directory
assert(!ctx.config.is64() ||
dataDirOffset64 == buf - buffer->getBufferStart());
auto *dir = reinterpret_cast<data_directory *>(buf);
buf += sizeof(*dir) * numberOfDataDirectory;
if (edataStart) {
Expand Down Expand Up @@ -1799,9 +1818,12 @@ template <typename PEHeaderTy> void Writer::writeHeader() {
exceptionTable.last->getSize() -
exceptionTable.first->getRVA();
}
if (relocSec->getVirtualSize()) {
size_t relocSize = relocSec->getVirtualSize();
if (ctx.dynamicRelocs)
relocSize -= ctx.dynamicRelocs->getSize();
if (relocSize) {
dir[BASE_RELOCATION_TABLE].RelativeVirtualAddress = relocSec->getRVA();
dir[BASE_RELOCATION_TABLE].Size = relocSec->getVirtualSize();
dir[BASE_RELOCATION_TABLE].Size = relocSize;
}
if (Symbol *sym = ctx.symtab.findUnderscore("_tls_used")) {
if (Defined *b = dyn_cast<Defined>(sym)) {
Expand Down Expand Up @@ -2523,7 +2545,6 @@ uint32_t Writer::getSizeOfInitializedData() {
void Writer::addBaserels() {
if (!ctx.config.relocatable)
return;
relocSec->chunks.clear();
std::vector<Baserel> v;
for (OutputSection *sec : ctx.outputSections) {
if (sec->header.Characteristics & IMAGE_SCN_MEM_DISCARDABLE)
Expand Down Expand Up @@ -2557,6 +2578,29 @@ void Writer::addBaserelBlocks(std::vector<Baserel> &v) {
relocSec->addChunk(make<BaserelChunk>(page, &v[i], &v[0] + j));
}

void Writer::createDynamicRelocs() {
if (!ctx.dynamicRelocs)
return;

// Adjust the Machine field in the COFF header to AMD64.
ctx.dynamicRelocs->add(IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE, sizeof(uint16_t),
coffHeaderOffset + offsetof(coff_file_header, Machine),
AMD64);

// Clear the load config directory.
// FIXME: Use the hybrid load config value instead.
ctx.dynamicRelocs->add(IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE, sizeof(uint32_t),
dataDirOffset64 +
LOAD_CONFIG_TABLE * sizeof(data_directory) +
offsetof(data_directory, RelativeVirtualAddress),
0);
ctx.dynamicRelocs->add(IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE, sizeof(uint32_t),
dataDirOffset64 +
LOAD_CONFIG_TABLE * sizeof(data_directory) +
offsetof(data_directory, Size),
0);
}

PartialSection *Writer::createPartialSection(StringRef name,
uint32_t outChars) {
PartialSection *&pSec = partialSections[{name, outChars}];
Expand Down Expand Up @@ -2660,6 +2704,18 @@ template <typename T> void Writer::prepareLoadConfig(T *loadConfig) {
loadConfig->DependentLoadFlags = ctx.config.dependentLoadFlags;
}

if (ctx.dynamicRelocs) {
IF_CONTAINS(DynamicValueRelocTableSection) {
loadConfig->DynamicValueRelocTableSection = relocSec->sectionIndex;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you're merging the checkLoadConfigGuardData function into the caller, since you're reusing this context here, to not only check member fields but also use the same IF_CONTAINS() to set fields - is that a correct observation?

Are there other fields that we currently set in load config, which could benefit from IF_CONTAINS() in the same way? Such a refactoring, either moving the IF_CONTAINS() macro elsewhere, or merging checkLoadConfigGuardData into the caller to ease reusing these, feels like it's somewhat orthogonal to this change. Or if it's not possible to neatly separate out that step, I guess that aspect at least should be mentioned and explained in the commit message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a correct observation. Using the existing helpers felt appropriate here.

For ARM64X, we will also need to modify the CHPEMetadataPointer field (although that's earlier than dynamic relocs in the load config, so we may use the existing check and warning). Unlike the ARM64EC load config, the native ARM64 load config doesn’t reference it in the CRT. This allows the same load config to work for the native target. On ARM64X, CHPE metadata exists only in EC code, so it wouldn’t be possible to reference it from the native load config anyway. Instead, the linker reads it from the EC load config and sets it in the native variant.

We should technically have such check for the existing dependentLoadFlags too. I’ll use that to split off that part.

loadConfig->DynamicValueRelocTableOffset =
ctx.dynamicRelocs->getRVA() - relocSec->getRVA();
}
else {
warn("'_load_config_used' structure too small to include dynamic "
"relocations");
}
}

if (ctx.config.guardCF == GuardCFLevel::Off)
return;
RETURN_IF_NOT_CONTAINS(GuardFlags)
Expand Down
68 changes: 68 additions & 0 deletions lld/test/COFF/arm64x-loadconfig.s
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
// REQUIRES: aarch64
// RUN: split-file %s %t.dir && cd %t.dir

// RUN: llvm-mc -filetype=obj -triple=aarch64-windows test.s -o test.obj
// RUN: llvm-mc -filetype=obj -triple=aarch64-windows loadconfig.s -o loadconfig.obj
// RUN: llvm-mc -filetype=obj -triple=aarch64-windows loadconfig-short.s -o loadconfig-short.obj

// RUN: lld-link -machine:arm64x -out:out.dll -dll -noentry loadconfig.obj test.obj
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're testing writing an arm64x image, but where we only have plain aarch64 chunks? And if changing to the arm64ec form at runtime, it changes the machine field and clears the load config, but nothing else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's the simplest possible ARM64X relocs test (and testing more complex cases will require namespaces).


// RUN: llvm-readobj --coff-load-config out.dll | FileCheck -check-prefix=DYNRELOCS %s
// DYNRELOCS: DynamicValueRelocTableOffset: 0xC
// DYNRELOCS-NEXT: DynamicValueRelocTableSection: 4
// DYNRELOCS: DynamicRelocations [
// DYNRELOCS-NEXT: Version: 0x1
// DYNRELOCS-NEXT: Arm64X [
// DYNRELOCS-NEXT: Entry [
// DYNRELOCS-NEXT: RVA: 0x7C
// DYNRELOCS-NEXT: Type: VALUE
// DYNRELOCS-NEXT: Size: 0x2
// DYNRELOCS-NEXT: Value: 0x8664
// DYNRELOCS-NEXT: ]
// DYNRELOCS-NEXT: Entry [
// DYNRELOCS-NEXT: RVA: 0x150
// DYNRELOCS-NEXT: Type: VALUE
// DYNRELOCS-NEXT: Size: 0x4
// DYNRELOCS-NEXT: Value: 0x0
// DYNRELOCS-NEXT: ]
// DYNRELOCS-NEXT: Entry [
// DYNRELOCS-NEXT: RVA: 0x154
// DYNRELOCS-NEXT: Type: VALUE
// DYNRELOCS-NEXT: Size: 0x4
// DYNRELOCS-NEXT: Value: 0x0
// DYNRELOCS-NEXT: ]
// DYNRELOCS-NEXT: ]
// DYNRELOCS-NEXT: ]

// RUN: llvm-readobj --headers out.dll | FileCheck -check-prefix=HEADERS %s
// HEADERS: BaseRelocationTableRVA: 0x4000
// HEADERS-NEXT: BaseRelocationTableSize: 0xC
// HEADERS: LoadConfigTableRVA: 0x1000
// HEADERS-NEXT: LoadConfigTableSize: 0x140
// HEADERS: Name: .reloc (2E 72 65 6C 6F 63 00 00)
// HEADERS-NEXT: VirtualSize: 0x38

// RUN: lld-link -machine:arm64x -out:out-short.dll -dll -noentry loadconfig-short.obj 2>&1 | FileCheck --check-prefix=WARN-RELOC-SIZE %s
// WARN-RELOC-SIZE: lld-link: warning: '_load_config_used' structure too small to include dynamic relocations

#--- test.s
.data
sym:
// Emit a basereloc to make the loadconfig test more meaningful.
.xword sym

#--- loadconfig.s
.section .rdata,"dr"
.globl _load_config_used
.p2align 3, 0
_load_config_used:
.word 0x140
.fill 0x13c,1,0

#--- loadconfig-short.s
.section .rdata,"dr"
.globl _load_config_used
.p2align 3, 0
_load_config_used:
.word 0xe4
.fill 0xe0,1,0
Loading