Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llext: Support non-paired RISC-V PCREL Relocation #82799

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

WorldofJARcraft
Copy link
Contributor

Currently, RISC-V's architecture-specific relocations assume that all relocations of type R_RISCV_PCREL_LO12_I and _S are processed immediately after the R_RISCV_PCREL_HI20 relocation that they share a relocation target with. While this is the case most of the time, the RISC-V PSABI specification does not guarantee that. This commit corrects this by determining the R_RISCV_PCREL_HI20 relocation based on the symbol value of the R_RISCV_PCREL_LO12 relocation, as specified in the PSABI.

@WorldofJARcraft WorldofJARcraft force-pushed the riscv-relocations-fix-pcrel-lo12 branch from f18fd43 to 60c3614 Compare December 10, 2024 17:40
@WorldofJARcraft WorldofJARcraft marked this pull request as ready for review December 10, 2024 17:40
@zephyrbot zephyrbot added area: RISCV RISCV Architecture (32-bit & 64-bit) area: llext Linkable Loadable Extensions labels Dec 10, 2024
@WorldofJARcraft WorldofJARcraft force-pushed the riscv-relocations-fix-pcrel-lo12 branch from 60c3614 to d642674 Compare December 11, 2024 08:11
lyakh
lyakh previously requested changes Dec 11, 2024
Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some clean up around function return codes and arguments is needed, some further comments are optional

*link_addr = 0;

if (ELF_R_SYM(rel->r_info) == 0) {
/* no symbol ex: R_ARM_V4BX relocation, R_ARM_RELATIVE */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is the original text just moved by you, but maybe we can make the guess that "ex" stands for "example" to help future readers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment

*/
static int llext_lookup_symbol(uintptr_t *link_addr, const elf_rela_t *rel, const elf_sym_t *sym,
struct llext *ext, struct llext_loader *ldr, const char *name,
const elf_shdr_t *shdr, int section_idx, int entry_idx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by convention LLEXT functions have ldr and ext as their first parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it accordingly in both new functions

elf_word rel_cnt, uintptr_t sect_base, int section_idx,
int entry_idx, struct llext *ext, const elf_sym_t *sym,
struct llext_loader *ldr, uintptr_t *link_addr_out,
int *last_rel_idx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same about argument order and removing section_idx and entry_idx. In general this looks like quite a large number of arguments, it often means, that some can be eliminated relatively easily

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the parameters that can be computed efficiently.

return -ENODATA;
}
ret = llext_lookup_symbol(&link_addr, &rel, &sym, ext, ldr, name, shdr, i,
j);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it fit within 100 characters if you join the two lines? You need to check the error code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the error check and reformatted the line

} else {
LOG_ERR("rela section %d, entry %d: cannot apply relocation: "
"target symbol has unexpected section index %d (0x%X)",
section_idx, entry_idx, sym->st_shndx, sym->st_shndx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

section_idx and entry_idx are only used in this print. Move it out of the function and remove the function arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

struct llext *dep;

*link_addr = (uintptr_t)llext_find_extension_sym(name, &dep);
if (link_addr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing dereference asterisk *

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added *

struct llext *ext, struct llext_loader *ldr, const char *name,
const elf_shdr_t *shdr, int section_idx, int entry_idx)
{
*link_addr = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't in the original code and is superfluous - as long as you add an error check in the caller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Copy link
Collaborator

@pillo79 pillo79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with all comments from @lyakh.
Also, for easier review, can you split the refactor that creates llext_lookup_symbol in a separate commit? Thanks!

@WorldofJARcraft WorldofJARcraft force-pushed the riscv-relocations-fix-pcrel-lo12 branch from d642674 to 3647645 Compare December 11, 2024 10:26
@WorldofJARcraft
Copy link
Contributor Author

Agree with all comments from @lyakh. Also, for easier review, can you split the refactor that creates llext_lookup_symbol in a separate commit? Thanks!

I have split the refactor into a separate commit.

Copy link
Collaborator

@pillo79 pillo79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor!
This PR adds a lot of RISCV-specific stuff to the common LLEXT code; I think it's a pity and I'm not yet convinced this is necessary.

If the llext_lookup_symbol() function was exported in include/llext/llext_internal.h, it should be possible to move the whole llext_riscv_find_sym_pcrel() function to the arch-specific arch/riscv/core/elf.c and only invoke it when those relocations are encountered.
But I am probably missing something big! 🙂

@WorldofJARcraft
Copy link
Contributor Author

WorldofJARcraft commented Dec 11, 2024

I can move the new llext_riscv_find_sym_pcrel function into the architecture-specific elf.c, that would not be a problem. You are right, the function is architecture-specific.
However, the function needs access to the struct llext_loader and the struct llext which are currently not provided via the arch_elf_relocate api, which is why it needs to be called from llext_link. I can add these additional parameters to arch_elf_relocate and remove the RISCV-specific code from llext_link.c if you prefer.

@WorldofJARcraft
Copy link
Contributor Author

WorldofJARcraft commented Dec 11, 2024

I believe the RISC-V PSABI breaks one of the fundamental assumptions that were made when originally designing the API: the PCREL_LO12 relocations cannot be relocated with just the information in the ELF relocation entry and symbol. The symbol value for these relocations is actually an instruction offset from the beginning of the current segment. The instruction at this offset is of type PCREL_HI20. The immediate for the PCREL_LO12 instruction is the lower 12 bits of the relocation target computed for the PCREL_HI20 relocation; together, these two types of relocation allow accesses within a 32-bit distance from the PCREL_HI20 relocation.
This causes the following problem: when a PCREL_LO12 relocation is encountered, I need to find the PCREL_HI20 relocation and compute the relocation target from here, using the lower 12 bits for the PCREL_LO12 relocation. To be able to do that, I need the additional parameters mentioned above that arch_elf_relocate does not currently provide.

@WorldofJARcraft
Copy link
Contributor Author

My original code used the assumption that the PCREL_LO12 is always processed immediately after the matching PCREL_HI20. This allowed me to store the jump target in a variable.
However, I realized this is not guaranteed by the specification. I also managed to break this with one of my llext extensions.

pillo79
pillo79 previously approved these changes Feb 18, 2025
Copy link
Collaborator

@pillo79 pillo79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of the new llext_text_start but otherwise LGTM, thanks for your efforts!

@WorldofJARcraft
Copy link
Contributor Author

I will try to build a regression test for this feature, similar to the one in #83198. I have not yet understood how to trigger the edge case from C reliably, so I will use another assembly listing for this.

Copy link
Collaborator

@teburd teburd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now have the symbol table accessible to us from the loader, perhaps giving an index into the symbol table is enough rather than the elf_sym_t, sym_name, and sym_base_addr?

Those things could be obtained via accessors/helper functions.

@WorldofJARcraft
Copy link
Contributor Author

WorldofJARcraft commented Feb 18, 2025

I will try to build a regression test for this feature, similar to the one in #83198. I have not yet understood how to trigger the edge case from C reliably, so I will use another assembly listing for this.

I have added a commit with this test. I have verified manually that it fails with the changes from this PR not applied and that that it passes when the changes are applied.
Currently, RISC-V tests do not run in the CI with twister (#85960), so you have to execute the test manually to check my work. Also, there appears to be an assertion failure in an unrelated test on RISC-V (test_inspect, probably due to linker flags).

@WorldofJARcraft WorldofJARcraft force-pushed the riscv-relocations-fix-pcrel-lo12 branch from 9e03dd1 to 3f6c93f Compare February 19, 2025 07:13
@WorldofJARcraft
Copy link
Contributor Author

WorldofJARcraft commented Feb 19, 2025

Since we now have the symbol table accessible to us from the loader, perhaps giving an index into the symbol table is enough rather than the elf_sym_t, sym_name, and sym_base_addr?

Those things could be obtained via accessors/helper functions.

I will eliminate the loc, sym, sym_base_addr, sym_name arguments and replace them with helper functions, as they can be computed from data that is already in memory.

This commit refactors logic in llext_link.c that resolves
a symbol to be imported by an llext into a utility function.
Thereby, the logic can be re-used in other functions if
needed, e.g., when resolving symbols for indirect relocations.

Signed-off-by: Eric Ackermann <[email protected]>
Tests an edge case in the RISC-V PSABI: In the medany and the medlow code
models, the compiler emits auipc/lui (U-type) and ld/sw (I-type/S-type)
instruction pairs for accessing a non-local symbol.
The U-type instruction sets the upper 20 bits, the I/S-type the lower 12.
The U-type and I-type/S-type instruction pairs are often adjacent in code.
This is also what the current llext architecture-specific relocations
expect.
However, this need not be the case - compilers can re-use the upper 20
bits set by the U-type instruction with multiple I/S-type instructions,
which is a useful optimization for multiple loads/stores of or within
the same symbol.
This commit adds a unit test for this behavior, which currently fails
for RISC-V.

Signed-off-by: Eric Ackermann <[email protected]>
@WorldofJARcraft WorldofJARcraft force-pushed the riscv-relocations-fix-pcrel-lo12 branch from 3f6c93f to 1d23f53 Compare February 19, 2025 09:57
@pillo79
Copy link
Collaborator

pillo79 commented Feb 19, 2025

[...] there appears to be an assertion failure in an unrelated test on RISC-V (test_inspect, probably due to linker flags).

Thanks for noticing, did not realize RISC-V was disabled in Twister! (why?)
Submitted a fix for this in #85989.

@WorldofJARcraft
Copy link
Contributor Author

RISC-V tests currently do not run because of the min_ram and min_flash attributes in testcase.yaml. The qemu "boards" for RISC-V do not specify RAM and flash, so twister deems them unsuitable.

Copy link
Collaborator

@pillo79 pillo79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on the new functions in llext_internal.h should be addressed IMO.

Not blocking since the latest changes were requested by Tom, but I feel that this will bring LLEXT to a more fragmented situation and I would like to avoid that if possible.

Comment on lines -399 to -404
LOG_DBG("relocation %d:%d info 0x%zx (type %zd, sym %zd) offset %zd "
"sym_name %s sym_type %d sym_bind %d sym_ndx %d",
i, j, (size_t)rel.r_info, (size_t)ELF_R_TYPE(rel.r_info),
(size_t)ELF_R_SYM(rel.r_info), (size_t)rel.r_offset,
name, ELF_ST_TYPE(sym.st_info),
ELF_ST_BIND(sym.st_info), sym.st_shndx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the following "writing relocation symbol" LOG_DBGs have provided very useful information when trying to debug linker issues, and are now lost to arch-specific code.

IMHO losing this common part will make LLEXT more fragmented and difficult to debug in the future.

Copy link
Collaborator

@teburd teburd Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can still be done but would require calling the accessors (cheap). The LOG_DBG and lookups can be wrapped in an ifdef CONFIG_LLEXT_LOG_LEVEL_DBG to avoid doing stuff we don't need to when its not debug logging.

I agree having these log messages can be invaluable in debugging relocation/linking issues. Please do try and maintain common log messages, if needed as noted here wrap a block in an ifdef.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have kept these log messages (at the expense of extra load calls when LOG_INF/LOG_DBG is enabled).

teburd
teburd previously approved these changes Feb 19, 2025
Copy link
Collaborator

@teburd teburd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patch set looks really nice now, thank you for the hard work!

teburd
teburd previously approved these changes Feb 19, 2025
pillo79
pillo79 previously approved these changes Feb 19, 2025
Copy link
Collaborator

@pillo79 pillo79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome. Thank you for your efforts, much appreciated! 🙇

The RISC-V port of llext requires additional parameters for
handling non-adjacent HI20/LO12 relocations in arch_elf_relocate():
the current extension (struct llext), the current extension loader
(struct llext_loader), the current section header (elf_shdr_t) and
the current symbol (elf_sym_t).
This changes the signature of arch_elf_relocate accordingly.

Signed-off-by: Eric Ackermann <[email protected]>
Currently, RISC-V's architecture-specific relocations assume that
all relocations of type R_RISCV_PCREL_LO12_I and _S are processed
immediately after the R_RISCV_PCREL_HI20 relocation that they
share a relocation target with. While this is the case most of
the time, the RISC-V PSABI specification does not guarantee that.
This commit corrects this by determining the R_RISCV_PCREL_HI20
relocation based on the symbol value of the R_RISCV_PCREL_LO12
relocation, as specified in the PSABI.

Signed-off-by: Eric Ackermann <[email protected]>
@WorldofJARcraft WorldofJARcraft dismissed stale reviews from pillo79 and teburd via 69baa40 February 19, 2025 17:21
@WorldofJARcraft WorldofJARcraft force-pushed the riscv-relocations-fix-pcrel-lo12 branch from 55b0f45 to 69baa40 Compare February 19, 2025 17:21
@WorldofJARcraft
Copy link
Contributor Author

I added an ifdef guard to cover cases where CONFIG_LOG is disabled - this should fix the checks.

@fabiobaltieri fabiobaltieri added this to the v4.2.0 milestone Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARC ARC Architecture area: ARM ARM (32-bit) Architecture area: ARM64 ARM (64-bit) Architecture area: llext Linkable Loadable Extensions area: RISCV RISCV Architecture (32-bit & 64-bit)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants