-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-128605: Add branch protections for x86_64 in asm_trampoline.S #128606
base: main
Are you sure you want to change the base?
Conversation
stratakis
commented
Jan 8, 2025
•
edited by bedevere-app
bot
Loading
edited by bedevere-app
bot
- Issue: asm_trampoline.S misses branch protection flags for x86_64 and aarch64 #128605
cc @fweimer-rh |
In the current implementation the cf-protection test for x86_64 passes but not the branch-protection test for aarch64. |
You should add PAC support at the same time, using |
d97e78d
to
6d86e0f
Compare
Rebased. The conditionals are a bit of a mess right now so I'll do some macros later on. However annobin still reports:
Btw the arm blog proposes a different styled note than what the annobin docs mention: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/p2-enabling-pac-and-bti-on-aarch64 Also regarding x86_64. Shouldn't the note be conditionalized in the case CET is active? |
6d86e0f
to
549b894
Compare
cc'ing also @pablogsal as the original author of the file. |
🤖 New build scheduled with the buildbot fleet by @pablogsal for commit 549b894 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break the DWARF mode as the debug information we are adding matches exactly with the current assembly instructions and positions:
cpython/Python/perf_jit_trampoline.c
Lines 452 to 489 in d0ecbdd
/* Emit DWARF EH CIE. */ | |
DWRF_SECTION(CIE, DWRF_U32(0); /* Offset to CIE itself. */ | |
DWRF_U8(DWRF_CIE_VERSION); | |
DWRF_STR("zR"); /* Augmentation. */ | |
DWRF_UV(1); /* Code alignment factor. */ | |
DWRF_SV(-(int64_t)sizeof(uintptr_t)); /* Data alignment factor. */ | |
DWRF_U8(DWRF_REG_RA); /* Return address register. */ | |
DWRF_UV(1); | |
DWRF_U8(DWRF_EH_PE_pcrel | DWRF_EH_PE_sdata4); /* Augmentation data. */ | |
DWRF_U8(DWRF_CFA_def_cfa); DWRF_UV(DWRF_REG_SP); DWRF_UV(sizeof(uintptr_t)); | |
DWRF_U8(DWRF_CFA_offset|DWRF_REG_RA); DWRF_UV(1); | |
DWRF_ALIGNNOP(sizeof(uintptr_t)); | |
) | |
ctx->eh_frame_p = p; | |
/* Emit DWARF EH FDE. */ | |
DWRF_SECTION(FDE, DWRF_U32((uint32_t)(p - framep)); /* Offset to CIE. */ | |
DWRF_U32(-0x30); /* Machine code offset relative to .text. */ | |
DWRF_U32(ctx->code_size); /* Machine code length. */ | |
DWRF_U8(0); /* Augmentation data. */ | |
/* Registers saved in CFRAME. */ | |
#ifdef __x86_64__ | |
DWRF_U8(DWRF_CFA_advance_loc | 4); | |
DWRF_U8(DWRF_CFA_def_cfa_offset); DWRF_UV(16); | |
DWRF_U8(DWRF_CFA_advance_loc | 6); | |
DWRF_U8(DWRF_CFA_def_cfa_offset); DWRF_UV(8); | |
/* Extra registers saved for JIT-compiled code. */ | |
#elif defined(__aarch64__) && defined(__AARCH64EL__) && !defined(__ILP32__) | |
DWRF_U8(DWRF_CFA_advance_loc | 1); | |
DWRF_U8(DWRF_CFA_def_cfa_offset); DWRF_UV(16); | |
DWRF_U8(DWRF_CFA_offset | 29); DWRF_UV(2); | |
DWRF_U8(DWRF_CFA_offset | 30); DWRF_UV(1); | |
DWRF_U8(DWRF_CFA_advance_loc | 3); | |
DWRF_U8(DWRF_CFA_offset | -(64 - 29)); | |
DWRF_U8(DWRF_CFA_offset | -(64 - 30)); | |
DWRF_U8(DWRF_CFA_def_cfa_offset); | |
DWRF_UV(0); |
I am also not sure why this is needed other than make the annocheck
tool happy so I am not sure we want to add this extra complexity.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
The justification for this change is provided at the linked issue. |
I don't think that justification is enough. The only thing the issue mentioned is that we are not using a feature and that the In any case, we need to check if the DWARF needs to be updated to account for the new instructions and if GDB can still unwind across them. |
Of course. Both ubuntu and Fedora (and by extension RHEL) and I assume other distros, enable by default the Both are implemented to mitigate return-oriented programming (ROP) and Call or Jump Oriented Programming (COP/JOP) attacks. Minimum supported CPU models are Intel's Tiger lake, AMD's Zen 3, Arm 8.3 (PAC) and Arm 8.5 (BTI). With pure C code everything is fine by just using the compiler flag(s), however when an assembly file is added to the mix the instructions need to be added manually. This feature is an all-or-nothing for the hardware functionality to work. All object files are checked for the note and if it's missing on any one, the linker does not place the note to the final executable and the hardening feature is disabled in the hardware. That essentially means that for Python 3.12+ these flags have no effect anymore. |
a7408db
to
6d2d4f7
Compare
Ok makes sense. Thanks a lot for the extra context. We need to complement the DWARF information to ensure that the Perf JIT integration still works and we need to ensure gdb can still unwind through the trampolines when this information is active. |
Are JIT compilers expected to implement this as well? Or no, since they aren't statically present in the executable? |
Technically this code is also not present in the executable: we copy it from memory but the region itself where we copy it from is never executed. So if the answer is "no" then we shouldn't need this either |
Providing also a more practical reproducer. On Python 3.11 on x86_64:
Both of those flags indicate that CET protection is enabled. In Python 3.12+ they are not present. Of course when building with I'll familiarize myself with the debug information and try to figure it out. |
The answer here would be possibly yes, assembly instructions emitted by JITs should implement that, however I am not familiar myself with JITs in general or the way CPython implements it to provide a definite answer. As JIT is still being developed I'd treat this as an afterthought and revisit it in the future. |
@pablogsal As the DWARF needs to be updated anyway, it may make sense to add the missing frame pointer to the x86-64 assembler (if Python's policy is to have frame pointers). Or is the goal to elide the calling frame from profiling backtraces? If the latter, there should really be a comment to this effect. |
@brandtbucher For JIT compilers to keep working unchanged, they need to be marked as incompatible for now. On x86-64, SHSTK is generally automatically compatible with JIT compilation unless stack unwinding code is generated. However, x86-64 IBT requires marker instructions at the target of indirect branches. On AArch64, BTI is similar (marker instructions), but PAC is not process-switched on Linux: it's determined at boot whether it is turned on or not. The presence of the extra instructions merely ensures that the JIT code does not contain any easy-to-use JOP/ROP gadgets. Whether any of this matters is of course debatable, assuming that Python does not write-protect any of its bytecode in the process image. |
One of the problems is that if you add the frame pointer to the x86-64 assembler gdb cannot unwind through this when Python is compiled without frame pointers. The current version allows unwinding with and without frame pointers. |
I'm afraid this is a bit too jargon-heavy for me to follow without doing a bunch of self-guided research on the topic. What does it mean for the JIT code to be "marked incompatible"? I take it that if the process/kernel has these hardware features enabled, JIT code that doesn't use it will probably crash? What's the typical performance overhead of enabling this, and who exactly is using it (Ubuntu and Fedora were mentioned above... is it all users of these distros who would be unable to use our JIT)? |
6d2d4f7
to
0070169
Compare
Took a shot at figuring this out, at least for x86_64. This is what I followed:
Compiled with cpython/Python/perf_jit_trampoline.c Lines 452 to 489 in d0ecbdd
And the eh_frame:
And then recompiling with
Still need to verify that it actually works of course. |
And I believe it should make things easier if I split this PR for x86_64 and aarch64. |
0070169
to
dcc714a
Compare
dcc714a
to
4b78b22
Compare
The PR is ready. I've split it, so this is only for x86_64. Not entirely sure that a news entry is necessary here but I can add one if required. I've also verified that gdb and perf work the same as before with and without -fcf-protection and annobin correctly verified that CET protection is present. python3.12:
python3.13:
python3.14:
|
4b78b22
to
f2dcd8c
Compare
Required for mitigation against return-oriented programming (ROP) and Call or Jump Oriented Programming (COP/JOP) attacks Manual application is required for the assembly files See also: https://sourceware.org/annobin/annobin.html/Test-cf-protection.html
f2dcd8c
to
f6dcc96
Compare
@pablogsal: @stratakis updated the DWARF information and explained why this change is needed. Would you mind to review it again? |
🤖 New build scheduled with the buildbot fleet by @pablogsal for commit f6dcc96 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F128606%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
Will run the buildbots and I will do a pass and merge |
Btw this should be fine to backport to 3.13 and, without the perf_jit_trampoline.c changes, to 3.12. |
A single buildbot worker failed:
But it's an unrelated failure:
There is the same error in the main branch: https://buildbot.python.org/#/builders/484/builds/6705 |
It's a regression introduced by commit fd545d7: issue #127705. |