-
Notifications
You must be signed in to change notification settings - Fork 49
[Intel-SIG] 5.15-ClearWater add FRED feature support #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Kunkka1988
wants to merge
102
commits into
openvelinux:5.15-velinux
Choose a base branch
from
Kunkka1988:5.15-velinux-fred-backport-20251106
base: 5.15-velinux
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[Intel-SIG] 5.15-ClearWater add FRED feature support #85
Kunkka1988
wants to merge
102
commits into
openvelinux:5.15-velinux
from
Kunkka1988:5.15-velinux-fred-backport-20251106
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
commit 5fc77b9 upstream. Create and use EX_TYPE_ZERO_REG to clear the register and retry the segment load on exception. Intel-SIG: commit 5fc77b9 x86/segment: Remove .fixup usage. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 6605694 upstream. Add the CPU feature bit for LKGS (Load "Kernel" GS). LKGS instruction is introduced with Intel FRED (flexible return and event delivery) specification. Search for the latest FRED spec in most search engines with this search pattern: site:intel.com FRED (flexible return and event delivery) specification LKGS behaves like the MOV to GS instruction except that it loads the base address into the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor cache, which is exactly what Linux kernel does to load a user level GS base. Thus, with LKGS, there is no need to SWAPGS away from the kernel GS base. [ mingo: Minor tweaks to the description. ] Intel-SIG: commit 6605694 x86/cpufeature: Add the CPU feature bit for LKGS. New Intel x86 cpu feature bit for LKGS definition. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/include/asm/cpufeatures.h [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 5a91f12 upstream. Add the instruction opcode used by LKGS to x86-opcode-map. Opcode number is per public FRED draft spec v3.0. Intel-SIG: commit 5a91f12 Add the instruction opcode used by LKGS to x86-opcode-map. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit df729fb upstream. Let GCC know that only the low 16 bits of load_gs_index() argument actually matter. It might allow it to create slightly better code. However, do not propagate this into the prototypes of functions that end up being paravirtualized, to avoid unnecessary changes. Intel-SIG: commit df729fb x86/gsseg: Make asm_load_gs_index() take an u16. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit ae53fa1 upstream. GS is a special segment on x86_64, move load_gs_index() to its own new header file to simplify header inclusion. No change in functionality. Intel-SIG: commit ae53fa1 x86/gsseg: Move load_gs_index() to its own new header file. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/include/asm/gsseg.h arch/x86/include/asm/special_insns.h [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 92cbbad upstream. The LKGS instruction atomically loads a segment descriptor into the %gs descriptor registers, *except* that %gs.base is unchanged, and the base is instead loaded into MSR_IA32_KERNEL_GS_BASE, which is exactly what we want this function to do. Intel-SIG: commit 92cbbad x86/gsseg: Use the LKGS instruction if available for load_gs_index(). Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit e12ad46 upstream. Module build needs to be able to pick up the C prototype: WARNING: modpost: EXPORT symbol "asm_load_gs_index" [vmlinux] version generation failed, symbol will not be versioned. Is "asm_load_gs_index" prototyped in <asm/asm-prototypes.h>? Intel-SIG: commit e12ad46 x86/gsseg: Add the new <asm/gsseg.h> header to <asm/asm-prototypes.h>. Fixes: ae53fa1 ("x86/gsseg: Move load_gs_index() to its own new header file") Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: [email protected] Conflicts: arch/x86/include/asm/asm-prototypes.h [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit a4cb5ec upstream. WRMSRNS is an instruction that behaves exactly like WRMSR, with the only difference being that it is not a serializing instruction by default. Under certain conditions, WRMSRNS may replace WRMSR to improve performance. Add its CPU feature bit, opcode to the x86 opcode map, and an always inline API __wrmsrns() to embed WRMSRNS into the code. Intel-SIG: commit a4cb5ec x86/cpufeatures,opcode,msr: Add the WRMSRNS instruction support. Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Shan Kang <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/include/asm/cpufeatures.h tools/arch/x86/include/asm/cpufeatures.h [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 3167b37 upstream. idtentry_sysvec is really just DECLARE_IDTENTRY defined in <asm/idtentry.h>, no need to define it separately. Intel-SIG: commit 3167b37 x86/entry: Remove idtentry_sysvec from entry_{32,64}.S. Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 8df7193 upstream. Intel VT-x classifies events into eight different types, which is inherited by FRED for event identification. As such, event types becomes a common x86 concept, and should be defined in a common x86 header. Add event type macros to <asm/trapnr.h>, and use them in <asm/vmx.h>. Intel-SIG: commit 8df7193 x86/trapnr: Add event type macros to <asm/trapnr.h>. Suggested-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/include/asm/vmx.h [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 51383e7 upstream. Briefly introduce FRED, and its advantages compared to IDT. Intel-SIG: commit 51383e7 Documentation/x86/64: Add documentation for FRED. Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Bagas Sanjaya <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: Documentation/arch/ [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 2cce959 upstream. Add the configuration option CONFIG_X86_FRED to enable FRED. Intel-SIG: commit 2cce959 x86/fred: Add Kconfig option for FRED (CONFIG_X86_FRED). Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 51c158f upstream. Any FRED enabled CPU will always have the following features as its baseline: 1) LKGS, load attributes of the GS segment but the base address into the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor cache. 2) WRMSRNS, non-serializing WRMSR for faster MSR writes. Intel-SIG: commit 51c158f x86/cpufeatures: Add the CPU feature bit for FRED. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/include/asm/cpufeatures.h arch/x86/kernel/cpu/cpuid-deps.c tools/arch/x86/include/asm/cpufeatures.h [ Xingjiang Lu: resolved context conflict ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit e554a8c upstream. Add CONFIG_X86_FRED to <asm/disabled-features.h> to make cpu_feature_enabled() work correctly with FRED. Intel-SIG: commit e554a8c x86/fred: Disable FRED support if CONFIG_X86_FRED is disabled. Originally-by: Megha Dey <[email protected]> Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/include/asm/disabled-features.h tools/arch/x86/include/asm/disabled-features.h [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 3810da1 upstream. Let command line option "fred" accept multiple options to make it easier to tweak its behavior. Currently, two options 'on' and 'off' are allowed, and the default behavior is to disable FRED. To enable FRED, append "fred=on" to the kernel command line. [ bp: Use cpu_feature_enabled(), touch ups. ] Intel-SIG: commit 3810da1 x86/fred: Add a fred= cmdline param. Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 0115f8b upstream. ERETU returns from an event handler while making a transition to ring 3, and ERETS returns from an event handler while staying in ring 0. Add instruction opcodes used by ERET[US] to the x86 opcode map; opcode numbers are per FRED spec v5.0. Intel-SIG: commit 0115f8b x86/opcode: Add ERET[US] instructions to the x86 opcode map. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit cd19bab upstream. Update the objtool decoder to know about the ERET[US] instructions (type INSN_CONTEXT_SWITCH). Intel-SIG: commit cd19bab x86/objtool: Teach objtool about ERET[US]. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: tools/objtool/arch/x86/decode.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit ff45746 upstream. Add X86_CR4_FRED macro for the FRED bit in %cr4. This bit must not be changed after initialization, so add it to the pinned CR4 bits. Intel-SIG: commit ff45746 x86/cpu: Add X86_CR4_FRED macro. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/include/uapi/asm/processor-flags.h arch/x86/kernel/cpu/common.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit cd6df3f upstream. Add MSR numbers for the FRED configuration registers per FRED spec 5.0. Intel-SIG: commit cd6df3f x86/cpu: Add MSR numbers for FRED configuration. Originally-by: Megha Dey <[email protected]> Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h [Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit ee63291 upstream. struct pt_regs is hard to read because the member or section related comments are not aligned with the members. The 'cs' and 'ss' members of pt_regs are type of 'unsigned long' while in reality they are only 16-bit wide. This works so far as the remaining space is unused, but FRED will use the remaining bits for other purposes. To prepare for FRED: - Cleanup the formatting - Convert 'cs' and 'ss' to u16 and embed them into an union with a u64 - Fixup the related printk() format strings Intel-SIG: commit ee63291 x86/ptrace: Cleanup the definition of the pt_regs structure. Suggested-by: Thomas Gleixner <[email protected]> Originally-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 3c77bf0 upstream. FRED defines additional information in the upper 48 bits of cs/ss fields. Therefore add the information definitions into the pt_regs structure. Specifically introduce a new structure fred_ss to denote the FRED flags above SS selector, which avoids FRED_SSX_ macros and makes the code simpler and easier to read. Intel-SIG: commit 3c77bf0 x86/ptrace: Add FRED additional information to the pt_regs structure. Suggested-by: Thomas Gleixner <[email protected]> Originally-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 32b09c2 upstream. Add a header file for FRED prototypes and definitions. Intel-SIG: commit 32b09c2 x86/fred: Add a new header file for FRED definitions. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 65c9cc9 upstream. When using FRED, reserve space at the top of the stack frame, just like i386 does. Intel-SIG: commit 65c9cc9 x86/fred: Reserve space for the FRED stack frame. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 9356c4b upstream. MSR_IA32_FRED_RSP0 is used during ring 3 event delivery, and needs to be updated to point to the top of next task stack during task switch. Intel-SIG: commit 9356c4b x86/fred: Update MSR_IA32_FRED_RSP0 during task switch. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/include/asm/switch_to.h [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 09794f6 upstream. SWAPGS is no longer needed thus NOT allowed with FRED because FRED transitions ensure that an operating system can _always_ operate with its own GS base address: - For events that occur in ring 3, FRED event delivery swaps the GS base address with the IA32_KERNEL_GS_BASE MSR. - ERETU (the FRED transition that returns to ring 3) also swaps the GS base address with the IA32_KERNEL_GS_BASE MSR. And the operating system can still setup the GS segment for a user thread without the need of loading a user thread GS with: - Using LKGS, available with FRED, to modify other attributes of the GS segment without compromising its ability always to operate with its own GS base address. - Accessing the GS segment base address for a user thread as before using RDMSR or WRMSR on the IA32_KERNEL_GS_BASE MSR. Note, LKGS loads the GS base address into the IA32_KERNEL_GS_BASE MSR instead of the GS segment's descriptor cache. As such, the operating system never changes its runtime GS base address. Intel-SIG: commit 09794f6 x86/fred: Disallow the swapgs instruction when FRED is enabled. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/kernel/process_64.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit df88387 upstream. Because FRED always restores the full value of %rsp, ESPFIX is no longer needed when it's enabled. Intel-SIG: commit df88387 x86/fred: No ESPFIX needed when FRED is enabled. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit ad41a14 upstream. Entering a new task is logically speaking a return from a system call (exec, fork, clone, etc.). As such, if ptrace enables single stepping a single step exception should be allowed to trigger immediately upon entering user space. This is not optional. NMI should *never* be disabled in user space. As such, this is an optional, opportunistic way to catch errors. Allow single-step trap and NMI when starting a new task, thus once the new task enters user space, single-step trap and NMI are both enabled immediately. Intel-SIG: commit ad41a14 x86/fred: Allow single-step trap and NMI when starting a new task. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 58c80cc upstream. On a FRED system, the faulting address (CR2) is passed on the stack, to avoid the problem of transient state. Thus the page fault address is read from the FRED stack frame instead of CR2 when FRED is enabled. Intel-SIG: commit 58c80cc x86/fred: Make exc_page_fault() work for FRED. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 90f3572 upstream. FRED and IDT can share most of the definitions and declarations so that in the majority of cases the actual handler implementation is the same. The differences are the exceptions where FRED stores exception related information on the stack and the sysvec implementations as FRED can handle irqentry/exit() in the dispatcher instead of having it in each handler. Also add stub defines for vectors which are not used due to Kconfig decisions to spare the ifdeffery in the actual FRED dispatch code. Intel-SIG: commit 90f3572 x86/idtentry: Incorporate definitions/ declarations of the FRED entries. Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 99fcc96 upstream. When occurred on different ring level, i.e., from user or kernel context, stack, while kernel #DB on a dedicated stack. This is exactly how FRED event delivery invokes an exception handler: ring 3 event on level 0 stack, i.e., current task stack; ring 0 event on the #DB dedicated stack specified in the IA32_FRED_STKLVLS MSR. So unlike IDT, the FRED debug exception entry stub doesn't do stack switch. On a FRED system, the debug trap status information (DR6) is passed on the stack, to avoid the problem of transient state. Furthermore, FRED transitions avoid a lot of ugly corner cases the handling of which can, and should be, skipped. The FRED debug trap status information saved on the stack differs from DR6 in both stickiness and polarity; it is exactly in the format which debug_read_clear_dr6() returns for the IDT entry points. Intel-SIG: commit 99fcc96 x86/fred: Add a debug fault entry stub for FRED. Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 4f76e86 upstream. Split the asm subroutines for handling NMIs versus IRQs that occur in the guest so that the NMI handler can be called from a noinstr section. As a bonus, the NMI path doesn't need an indirect branch. Intel-SIG: commit 4f76e86 KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ handlers. Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Conflicts: arch/x86/kvm/vmx/vmenter.S arch/x86/kvm/vmx/vmx.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 70d0fe5 upstream. When FRED is enabled, call fred_entry_from_kvm() to handle IRQ/NMI in IRQ/NMI induced VM exits. Intel-SIG: commit 70d0fe5 KVM: VMX: Call fred_entry_from_kvm() for IRQ/NMI handling. Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/kvm/vmx/vmx.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 530dce2 upstream. Because FRED uses the ring 3 FRED entrypoint for SYSCALL and SYSENTER and ERETU is the only legit instruction to return to ring 3, there is NO need to setup SYSCALL and SYSENTER MSRs for FRED, except the IA32_STAR MSR. Split IDT syscall setup code into idt_syscall_init() to make it easy to skip syscall setup code when FRED is enabled. Intel-SIG: commit 530dce2 x86/syscall: Split IDT syscall setup code into idt_syscall_init(). Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/kernel/cpu/common.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit cdd99dd upstream. Add cpu_init_fred_exceptions() to: - Set FRED entrypoints for events happening in ring 0 and 3. - Specify the stack level for IRQs occurred ring 0. - Specify dedicated event stacks for #DB/NMI/#MCE/#DF. - Enable FRED and invalidtes IDT. - Force 32-bit system calls to use "int $0x80" only. Add fred_complete_exception_setup() to: - Initialize system_vectors as done for IDT systems. - Set unused sysvec_table entries to fred_handle_spurious_interrupt(). Intel-SIG: commit cdd99dd x86/fred: Add FRED initialization functions. Co-developed-by: Xin Li <[email protected]> Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/kernel/Makefile [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 208d8c7 upstream. Let cpu_init_exception_handling() call cpu_init_fred_exceptions() to initialize FRED. However if FRED is unavailable or disabled, it falls back to set up TSS IST and initialize IDT. Intel-SIG: commit 208d8c7 x86/fred: Invoke FRED initialization code to enable FRED. Co-developed-by: Xin Li <[email protected]> Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/kernel/cpu/common.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
…ng to inline properly commit cba9ff3 upstream. Change array_index_mask_nospec() to __always_inline because "inline" is broken as https://www.kernel.org/doc/local/inline.html. Intel-SIG: commit cba9ff3 x86/fred: Fix a build warning with allmodconfig due to 'inline' failing to inline properly. Fixes: 6786137bf8fd ("x86/fred: FRED entry/exit and dispatch code") Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit e138419 upstream. Add H. Peter Anvin and myself as FRED maintainers. Intel-SIG: commit e138419 MAINTAINERS: Add a maintainer entry for FRED. Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: H. Peter Anvin (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] [Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit e57ef2e upstream. The layout of per-cpu variables is at the mercy of the compiler. This can lead to random performance fluctuations from build to build. Create a structure to hold some of the hottest per-cpu variables, starting with current_task. Intel-SIG: commit e57ef2e x86: Put hot per CPU variables into a struct. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/kernel/process_32.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 96e8fc5 upstream. Instead of clearing the bss area in assembly code, use the clear_bss() function. This requires to pass the start_info address as parameter to xen_start_kernel() in order to avoid the xen_start_info being zeroed again. Intel-SIG: commit 96e8fc5 x86/xen: Use clear_bss() for Xen PV guests. Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jan Beulich <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/xen/enlighten_pv.c arch/x86/xen/xen-head.S [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit e94cd15 upstream. The synchronization of the AP with the control CPU is a SMP boot problem and has nothing to do with cpu_init(). Open code cpu_init_secondary() in start_secondary() and move wait_for_master_cpu() into the SMP boot code. No functional change. Intel-SIG: commit e94cd15 x86/smpboot: Get rid of cpu_init_secondary(). Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Oleksandr Natalenko <[email protected]> Tested-by: Helge Deller <[email protected]> # parisc Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/include/asm/processor.h arch/x86/kernel/cpu/common.c arch/x86/kernel/smpboot.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 3adee77 upstream. In order to facilitate parallel startup, start to eliminate some of the global variables passing information to CPUs in the startup path. However, start by introducing one more: smpboot_control. For now this merely holds the CPU# of the CPU which is coming up. Each CPU can then find its own per-cpu data, and everything else it needs can be found from there, allowing the other global variables to be removed. First to be removed is initial_stack. Each CPU can load %rsp from its current_task->thread.sp instead. That is already set up with the correct idle thread for APs. Set up the .sp field in INIT_THREAD on x86 so that the BSP also finds a suitable stack pointer in the static per-cpu data when coming up on first boot. On resume from S3, the CPU needs a temporary stack because its idle task is already active. Instead of setting initial_stack, the sleep code can simply set its own current->thread.sp to point to the temporary stack. Nobody else cares about ->thread.sp for a thread which is currently on a CPU, because the true value is actually in the %rsp register. Which is restored with the rest of the CPU context in do_suspend_lowlevel(). Intel-SIG: commit 3adee77 x86/smpboot: Remove initial_stack on 64-bit. Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Usama Arif <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Usama Arif <[email protected]> Tested-by: Guilherme G. Piccoli <[email protected]> Reviewed-by: David Woodhouse <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/kernel/asm-offsets.c arch/x86/kernel/head_64.S arch/x86/xen/xen-head.S [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit c416b5b upstream. As TOP_OF_KERNEL_STACK_PADDING was defined as 0 on x86_64, it went unnoticed that the initialization of the .sp field in INIT_THREAD and some calculations in the low level startup code do not take the padding into account. FRED enabled kernels require a 16 byte padding, which means that the init task initialization and the low level startup code use the wrong stack offset. Subtract TOP_OF_KERNEL_STACK_PADDING in all affected places to adjust for this. Intel-SIG: commit c416b5b x86/fred: Fix init_task thread stack pointer initialization. Fixes: 65c9cc9 ("x86/fred: Reserve space for the FRED stack frame") Fixes: 3adee77 ("x86/smpboot: Remove initial_stack on 64-bit") Reported-by: kernel test robot <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/kernel/head_64.S arch/x86/xen/xen-head.S [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 989b5cf upstream. Depending on whether FRED is enabled, sysvec_install() installs a system interrupt handler into either into FRED's system vector dispatch table or into the IDT. However FRED can be disabled later in trap_init(), after sysvec_install() has been invoked already; e.g., the HYPERVISOR_CALLBACK_VECTOR handler is registered with sysvec_install() in kvm_guest_init(), which is called in setup_arch() but way before trap_init(). IOW, there is a gap between FRED is available and available but disabled. As a result, when FRED is available but disabled, early sysvec_install() invocations fail to install the IDT handler resulting in spurious interrupts. Fix it by parsing cmdline param "fred=" in cpu_parse_early_param() to ensure that FRED is disabled before the first sysvec_install() incovations. Intel-SIG: commit 989b5cf x86/fred: Parse cmdline param "fred=" in cpu_parse_early_param(). Fixes: 3810da1 ("x86/fred: Add a fred= cmdline param") Reported-by: Hou Wenlong <[email protected]> Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] Conflicts: arch/x86/kernel/cpu/common.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 73270c1 upstream. To enable FRED earlier, move the RSP initialization out of cpu_init_fred_exceptions() into cpu_init_fred_rsps(). This is required as the FRED RSP initialization depends on the availability of the CPU entry areas which are set up late in trap_init(), No functional change intended. Marked with Fixes as it's a depedency for the real fix. Intel-SIG: commit 73270c1 x86/fred: Move FRED RSP initialization into a separate function. Fixes: 14619d9 ("x86/fred: FRED entry/exit and dispatch code") Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] Conflicts: arch/x86/kernel/cpu/common.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit a97756c upstream. On 64-bit init_mem_mapping() relies on the minimal page fault handler provided by the early IDT mechanism. The real page fault handler is installed right afterwards into the IDT. This is problematic on CPUs which have X86_FEATURE_FRED set because the real page fault handler retrieves the faulting address from the FRED exception stack frame and not from CR2, but that does obviously not work when FRED is not yet enabled in the CPU. To prevent this enable FRED right after init_mem_mapping() without interrupt stacks. Those are enabled later in trap_init() after the CPU entry area is set up. [ tglx: Encapsulate the FRED details ] Intel-SIG: commit a97756c x86/fred: Enable FRED right after init_mem_mapping(). Fixes: 14619d9 ("x86/fred: FRED entry/exit and dispatch code") Reported-by: Hou Wenlong <[email protected]> Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] Conflicts: arch/x86/include/asm/processor.h arch/x86/kernel/cpu/common.c arch/x86/kernel/smpboot.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 723edbd upstream. SS is initialized to NULL during boot time and not explicitly set to __KERNEL_DS. With FRED enabled, if a kernel event is delivered before a CPU goes to user level for the first time, its SS is NULL thus NULL is pushed into the SS field of the FRED stack frame. But before ERETS is executed, the CPU may context switch to another task and go to user level. Then when the CPU comes back to kernel mode, SS is changed to __KERNEL_DS. Later when ERETS is executed to return from the kernel event handler, a #GP fault is generated because SS doesn't match the SS saved in the FRED stack frame. Initialize SS to __KERNEL_DS when enabling FRED to prevent that. Note, IRET doesn't check if SS matches the SS saved in its stack frame, thus IDT doesn't have this problem. For IDT it doesn't matter whether SS is set to __KERNEL_DS or not, because it's set to NULL upon interrupt or exception delivery and __KERNEL_DS upon SYSCALL. Thus it's pointless to initialize SS for IDT. Intel-SIG: commit 723edbd x86/fred: Set SS to __KERNEL_DS when enabling FRED. Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 0dfac6f upstream. In most cases, ti_work values passed to arch_exit_to_user_mode_prepare() are zeros, e.g., 99% in kernel build tests. So an obvious optimization is to test ti_work for zero before processing individual bits in it. Omit the optimization when FPU debugging is enabled, otherwise the FPU consistency check is never executed. Intel 0day tests did not find a perfermance regression with this change. Intel-SIG: commit 0dfac6f x86/entry: Test ti_work for zero before processing individual bits. Suggested-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
…nism commit efe5088 upstream. Per the discussion about FRED MSR writes with WRMSRNS instruction [1], use the alternatives mechanism to choose WRMSRNS when it's available, otherwise fallback to WRMSR. Remove the dependency on X86_FEATURE_WRMSRNS as WRMSRNS is no longer dependent on FRED. [1] https://lore.kernel.org/lkml/[email protected]/ Use DS prefix to pad WRMSR instead of a NOP. The prefix is ignored. At least that's the current information from the hardware folks. Intel-SIG: commit efe5088 x86/msr: Switch between WRMSRNS and WRMSR with the alternatives mechanism. Signed-off-by: Andrew Cooper <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] Conflicts: arch/x86/include/asm/switch_to.h arch/x86/kernel/cpu/cpuid-deps.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
…itch commit fe85ee3 upstream. The FRED RSP0 MSR points to the top of the kernel stack for user level event delivery. As this is the task stack it needs to be updated when a task is scheduled in. The update is done at context switch. That means it's also done when switching to kernel threads, which is pointless as those never go out to user space. For KVM threads this means there are two writes to FRED_RSP0 as KVM has to switch to the guest value before VMENTER. Defer the update to the exit to user space path and cache the per CPU FRED_RSP0 value, so redundant writes can be avoided. Provide fred_sync_rsp0() for KVM to keep the cache in sync with the actual MSR value after returning from guest to host mode. [ tglx: Massage change log ] Intel-SIG: commit fe85ee3 x86/entry: Set FRED RSP0 on return to userspace instead of context switch. Suggested-by: Sean Christopherson <[email protected]> Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] Conflicts: arch/x86/include/asm/switch_to.h [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 991625f upstream. The bits required to make the hardware go.. Of note is that, provided the syscall entry points are covered with ENDBR, #CP doesn't need to be an IST because we'll never hit the syscall gap. Intel-SIG: commit 991625f x86/ibt: Add IBT feature, MSR and #CP handling. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/include/asm/cpufeatures.h arch/x86/include/uapi/asm/processor-flags.h arch/x86/kernel/cpu/common.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 18e66b6 upstream. Shadow stack provides protection for applications against function return address corruption. It is active when the processor supports it, the kernel has CONFIG_X86_SHADOW_STACK enabled, and the application is built for the feature. This is only implemented for the 64-bit kernel. When it is enabled, legacy non-shadow stack applications continue to work, but without protection. Since there is another feature that utilizes CET (Kernel IBT) that will share implementation with shadow stacks, create CONFIG_CET to signify that at least one CET feature is configured. Intel-SIG: commit 18e66b6 x86/shstk: Add Kconfig option for shadow stack. Co-developed-by: Yu-cheng Yu <[email protected]> Signed-off-by: Yu-cheng Yu <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Tested-by: Pengfei Xu <[email protected]> Tested-by: John Allen <[email protected]> Tested-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/all/20230613001108.3040476-7-rick.p.edgecombe%40intel.com Conflicts: arch/x86/Kconfig.assembler arch/x86/Kconfig [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 2da5b91 upstream. Today the control protection handler is defined in traps.c and used only for the kernel IBT feature. To reduce ifdeffery, move it to it's own file. In future patches, functionality will be added to make this handler also handle user shadow stack faults. So name the file cet.c. No functional change. Intel-SIG: commit 2da5b91 x86/traps: Move control protection handler to separate file. Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Tested-by: Pengfei Xu <[email protected]> Tested-by: John Allen <[email protected]> Tested-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/all/20230613001108.3040476-8-rick.p.edgecombe%40intel.com Conflicts: arch/x86/kernel/Makefile arch/x86/kernel/cet.c arch/x86/kernel/traps.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 5b2fc51 upstream. Even though Xen currently doesn't advertise IBT, prepare for when it will eventually do so and sprinkle the ENDBR dust accordingly. Even though most of the entry points are IRET like, the CPL0 Hypervisor can set WAIT-FOR-ENDBR and demand ENDBR at these sites. Intel-SIG: commit 5b2fc51 x86/ibt,xen: Sprinkle the ENDBR. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: arch/x86/entry/entry_64.S arch/x86/kernel/head_64.S arch/x86/xen/xen-asm.S arch/x86/xen/xen-head.S [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit 701fb66 upstream. The Control-Flow Enforcement Technology contains two related features, one of which is Shadow Stacks. Future patches will utilize this feature for shadow stack support in KVM, so add a CPU feature flags for Shadow Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]). To protect shadow stack state from malicious modification, the registers are only accessible in supervisor mode. This implementation context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend on XSAVES. The shadow stack feature, enumerated by the CPUID bit described above, encompasses both supervisor and userspace support for shadow stack. In near future patches, only userspace shadow stack will be enabled. In expectation of future supervisor shadow stack support, create a software CPU capability to enumerate kernel utilization of userspace shadow stack support. This user shadow stack bit should depend on the HW "shstk" capability and that logic will be implemented in future patches. Intel-SIG: commit 701fb66 x86/cpufeatures: Add CPU feature flags for shadow stacks. Co-developed-by: Yu-cheng Yu <[email protected]> Signed-off-by: Yu-cheng Yu <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Tested-by: Pengfei Xu <[email protected]> Tested-by: John Allen <[email protected]> Tested-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/all/20230613001108.3040476-9-rick.p.edgecombe%40intel.com Conflicts: arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/disabled-features.h arch/x86/kernel/cpu/cpuid-deps.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit a5f6c2a upstream. A control-protection fault is triggered when a control-flow transfer attempt violates Shadow Stack or Indirect Branch Tracking constraints. For example, the return address for a RET instruction differs from the copy on the shadow stack. There already exists a control-protection fault handler for handling kernel IBT faults. Refactor this fault handler into separate user and kernel handlers, like the page fault handler. Add a control-protection handler for usermode. To avoid ifdeffery, put them both in a new file cet.c, which is compiled in the case of either of the two CET features supported in the kernel: kernel IBT or user mode shadow stack. Move some static inline functions from traps.c into a header so they can be used in cet.c. Opportunistically fix a comment in the kernel IBT part of the fault handler that is on the end of the line instead of preceding it. Keep the same behavior for the kernel side of the fault handler, except for converting a BUG to a WARN in the case of a #CP happening when the feature is missing. This unifies the behavior with the new shadow stack code, and also prevents the kernel from crashing under this situation which is potentially recoverable. The control-protection fault handler works in a similar way as the general protection fault handler. It provides the si_code SEGV_CPERR to the signal handler. Intel-SIG: commit a5f6c2a x86/shstk: Add user control-protection fault handler. Co-developed-by: Yu-cheng Yu <[email protected]> Signed-off-by: Yu-cheng Yu <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Tested-by: Pengfei Xu <[email protected]> Tested-by: John Allen <[email protected]> Tested-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/all/20230613001108.3040476-28-rick.p.edgecombe%40intel.com Conflicts: arch/x86/include/asm/disabled-features.h arch/x86/kernel/signal_compat.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit c6cfcbd upstream. The following warning is reported when frame pointers and kernel IBT are enabled: vmlinux.o: warning: objtool: ibt_selftest+0x11: sibling call from callable instruction with modified stack frame The problem is that objtool interprets the indirect branch in ibt_selftest() as a sibling call, and GCC inserts a (partial) frame pointer prologue before it: 0000 000000000003f550 <ibt_selftest>: 0000 3f550: f3 0f 1e fa endbr64 0004 3f554: e8 00 00 00 00 call 3f559 <ibt_selftest+0x9> 3f555: R_X86_64_PLT32 __fentry__-0x4 0009 3f559: 55 push %rbp 000a 3f55a: 48 8d 05 02 00 00 00 lea 0x2(%rip),%rax # 3f563 <ibt_selftest_ip> 0011 3f561: ff e0 jmp *%rax Note the inline asm is missing ASM_CALL_CONSTRAINT, so the 'push %rbp' happens before the indirect branch and the 'mov %rsp, %rbp' happens afterwards. Simplify the generated code and make it easier to understand for both tools and humans by moving the selftest to proper asm. Intel-SIG: commit c6cfcbd x86/ibt: Convert IBT selftest to asm. Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/99a7e16b97bda97bf0a04aa141d6241cd8a839a2.1680912949.git.jpoimboe@kernel.org Conflicts: arch/x86/kernel/Makefile arch/x86/kernel/cpu/common.c arch/x86/kernel/ibt_selftest.S [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit dc81e556f2a017d681251ace21bf06c126d5a192 upstream. An indirect branch instruction sets the CPU indirect branch tracker (IBT) into WAIT_FOR_ENDBRANCH (WFE) state and WFE stays asserted across the instruction boundary. When the decoder finds an inappropriate instruction while WFE is set ENDBR, the CPU raises a #CP fault. For the "kernel IBT no ENDBR" selftest where #CPs are deliberately triggered, the WFE state of the interrupted context needs to be cleared to let execution continue. Otherwise when the CPU resumes from the instruction that just caused the previous #CP, another missing-ENDBRANCH #CP is raised and the CPU enters a dead loop. This is not a problem with IDT because it doesn't preserve WFE and IRET doesn't set WFE. But FRED provides space on the entry stack (in an expanded CS area) to save and restore the WFE state, thus the WFE state is no longer clobbered, so software must clear it. Clear WFE to avoid dead looping in ibt_clear_fred_wfe() and the !ibt_fatal code path when execution is allowed to continue. Clobbering WFE in any other circumstance is a security-relevant bug. [ dhansen: changelog rewording ] Intel-SIG: commit dc81e556f2a0 x86/fred: Clear WFE in missing-ENDBRANCH Fixes: a5f6c2a ("x86/shstk: Add user control-protection fault handler") Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Dave Hansen <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/all/20241113175934.3897541-1-xin%40zytor.com [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit de31b3cd706347044e1a57d68c3a683d58e8cca4 upstream. The FRED RSP0 MSR is only used for delivering events when running userspace. Linux leverages this property to reduce expensive MSR writes and optimize context switches. The kernel only writes the MSR when about to run userspace *and* when the MSR has actually changed since the last time userspace ran. This optimization is implemented by maintaining a per-CPU cache of FRED RSP0 and then checking that against the value for the top of current task stack before running userspace. However cpu_init_fred_exceptions() writes the MSR without updating the per-CPU cache. This means that the kernel might return to userspace with MSR_IA32_FRED_RSP0==0 when it needed to point to the top of current task stack. This would induce a double fault (#DF), which is bad. A context switch after cpu_init_fred_exceptions() can paper over the issue since it updates the cached value. That evidently happens most of the time explaining how this bug got through. Fix the bug through resynchronizing the FRED RSP0 MSR with its per-CPU cache in cpu_init_fred_exceptions(). Intel-SIG: commit de31b3cd7063 x86/fred: Fix the FRED RSP0 MSR out of sync with its per-CPU cache. Fixes: fe85ee3 ("x86/entry: Set FRED RSP0 on return to userspace instead of context switch") Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Dave Hansen <[email protected]> Cc:[email protected] Link: https://lore.kernel.org/all/20250110174639.1250829-1-xin%40zytor.com [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
commit e5f1e8af9c9e151ecd665f6d2e36fb25fec3b110 upstream. Upon a wakeup from S4, the restore kernel starts and initializes the FRED MSRs as needed from its perspective. It then loads a hibernation image, including the image kernel, and attempts to load image pages directly into their original page frames used before hibernation unless those frames are currently in use. Once all pages are moved to their original locations, it jumps to a "trampoline" page in the image kernel. At this point, the image kernel takes control, but the FRED MSRs still contain values set by the restore kernel, which may differ from those set by the image kernel before hibernation. Therefore, the image kernel must ensure the FRED MSRs have the same values as before hibernation. Since these values depend only on the location of the kernel text and data, they can be recomputed from scratch. Intel-SIG: commit e5f1e8af9c9e x86/fred: Fix system hang during S4 resume with FRED enabled. Reported-by: Xi Pardee <[email protected]> Reported-by: Todd Brandt <[email protected]> Tested-by: Todd Brandt <[email protected]> Suggested-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Rafael J. Wysocki <[email protected]> Reviewed-by: H. Peter Anvin (Intel) <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Xingjiang Lu: amend commit log ] Signed-off-by: Lu, Xingjiang <[email protected]>
…rn from SIGTRAP handler commit e34dbbc85d64af59176fe59fad7b4122f4330fe2 upstream. Clear the software event flag in the augmented SS to prevent immediate repeat of single step trap on return from SIGTRAP handler if the trap flag (TF) is set without an external debugger attached. Following is a typical single-stepping flow for a user process: 1) The user process is prepared for single-stepping by setting RFLAGS.TF = 1. 2) When any instruction in user space completes, a #DB is triggered. 3) The kernel handles the #DB and returns to user space, invoking the SIGTRAP handler with RFLAGS.TF = 0. 4) After the SIGTRAP handler finishes, the user process performs a sigreturn syscall, restoring the original state, including RFLAGS.TF = 1. 5) Goto step 2. According to the FRED specification: A) Bit 17 in the augmented SS is designated as the software event flag, which is set to 1 for FRED event delivery of SYSCALL, SYSENTER, or INT n. B) If bit 17 of the augmented SS is 1 and ERETU would result in RFLAGS.TF = 1, a single-step trap will be pending upon completion of ERETU. In step 4) above, the software event flag is set upon the sigreturn syscall, and its corresponding ERETU would restore RFLAGS.TF = 1. This combination causes a pending single-step trap upon completion of ERETU. Therefore, another #DB is triggered before any user space instruction is executed, which leads to an infinite loop in which the SIGTRAP handler keeps being invoked on the same user space IP. Intel-SIG: commit e34dbbc85d64 x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler. Fixes: 14619d9 ("x86/fred: FRED entry/exit and dispatch code") Suggested-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Tested-by: Sohil Mehta <[email protected]> Cc:[email protected] Link: https://lore.kernel.org/all/20250609084054.2083189-2-xin%40zytor.com Conflicts: arch/x86/include/asm/sighandling.h arch/x86/kernel/signal.c [ Xingjiang Lu: resolve context conflicts ] Signed-off-by: Lu, Xingjiang <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There are 102 patches in this single PR for FRED back-port from upstream stable kernel.
This PR is based on commit node:
"c73db7ebb024 (origin/5.15-velinux, 5.15-velinux) bytedance: config: enable BYTEDANCE_X86_MCE_STAT".
All of FRED LKVS test cases are passed:
$ sudo ./fred_test.sh -t cmdline
|1105_231855.993|TRACE|do_cmd() is called by fred_test.sh:35:cmdline_test()|
|1105_231855.995|TRACE|CMD=grep -q 'fred=on' '/proc/cmdline'|