As part of the loading process, the kernel supplies an AT_PHDR
value to userspace, which is meant to contain the virtual address of the main executable's program headers. The e_phoff
field of the ELF header contains the file offset of the main executable's program headers, which the kernel needs to translate into a virtual address.
As a reminder, there are three relevant address spaces in ELF files:
- File offsets (e.g.
e_phoff
in ELF header,p_offset
in program headers). - Ideal virtual adresses (e.g.
e_entry
in ELF header,p_vaddr
in program headers). - Actual virtual addresses.
The mapping between (1) and (2) is potentially non-linear; the PT_LOAD
program headers can set up an arbitrarily complex mapping if so desired. In contrast, the mapping between (2) and (3) is very simple, and consists of a single offset value (randomly) chosen by the kernel at load-time.
To translate e_phoff
from (1) to (2), the correct approach is to find the particular PT_LOAD
header whose p_offset
/ p_filesz
range covers e_phoff
, and then compute e_phoff + p_vaddr - p_offset
using that PT_LOAD
header. Unfortunately, before kernel commit 0da1d50027 in March 2022 (released in 5.17.2, backported to 5.16.19 / 5.15.33 / 5.10.110), the kernel took the first PT_LOAD
header and used that for doing e_phoff + p_vaddr - p_offset
. This bug is benign if the first PT_LOAD
header has a p_offset
/ p_filesz
range which covers e_phoff
, and this happens to be the case for most ELF files produced by most compilers. This bug is also benign if p_vaddr - p_offset
as computed for the first PT_LOAD
equals p_vaddr - p_offset
as for the PT_LOAD
whose file range covers e_phoff
.
In cases where the bug isn't benign, its consequences are bad: an incorrect AT_PHDR
value will cause the dynamic linker to either segfault or fail to properly perform dynamic linking of the target executable (which in turn will likely cause a segfault fairly quickly).
In cases where polyfill-glibc needs to change e_phoff
, this bug presents a problem. It can't re-order PT_LOAD
commands to put the PT_LOAD
covering e_phoff
at the start of the list, as PT_LOAD
commands need to be in ascending p_vaddr
order (per the ELF specification, "Loadable segment entries in the program header table appear in ascending order, sorted on the p_vaddr member", and most loaders rely on this). If the first PT_LOAD
command specified a non-zero p_vaddr
, then polyfill-glibc could carve out some new virtual address space before the first PT_LOAD
command. Unfortunately, it is common for dynamic libraries and position-independent executables to have p_vaddr
of their first PT_LOAD
be equal to zero, which leaves no space before it. Instead, polyfill-glibc will insert a new PT_LOAD
at the start of the list, with p_vaddr
equal to whatever was previously first (to maintain the sorted order), p_filesz
equal to zero (so that p_offset
is not used for anything ‡), and p_offset
set such that p_vaddr - p_offset
of this new PT_LOAD
equals p_vaddr - p_offset
of the PT_LOAD
covering the program headers.
(‡) Except that glibc's dynamic loader always passes p_offset
of the first PT_LOAD
as the offset value to mmap
when establishing the base address of dynamic libraries and position-independent executables, even if p_filesz
of that PT_LOAD
is zero. Accordingly, polyfill-glibc needs to keep the p_offset
value within the range of allowable values for an mmap
offset.
A similar bug exists in ldconfig
in glibc versions prior to 2.31 (released February 2020), albeit translating d_un.d_val
of DT_STRTAB
from address space (2) to address space (1), rather than translating e_phoff
from address space (1) to address space (2), but the summary is the same: it uses p_vaddr - p_offset
of the first PT_LOAD
rather than using p_vaddr - p_offset
of the PT_LOAD
covering DT_STRTAB
. To work around this, if polyfill-glibc needs to move either the program headers or the dynamic string table, then it'll move both, ensuring that the same new PT_LOAD
command covers both, and then the workaround for the kernel also fixes things for ldconfig
.