Skip to content

Conversation

@Kunkka1988
Copy link

There are 102 patches in this single PR for FRED back-port from upstream stable kernel.
This PR is based on commit node:
"c73db7ebb024 (origin/5.15-velinux, 5.15-velinux) bytedance: config: enable BYTEDANCE_X86_MCE_STAT".

All of FRED LKVS test cases are passed:
$ sudo ./fred_test.sh -t cmdline
|1105_231855.993|TRACE|do_cmd() is called by fred_test.sh:35:cmdline_test()|
|1105_231855.995|TRACE|CMD=grep -q 'fred=on' '/proc/cmdline'|

$ sudo ./fred_test.sh -t dmesg
|1105_231901.331|TRACE|do_cmd() is called by fred_test.sh:40:dmesg_test()|
|1105_231901.334|TRACE|CMD=dmesg | grep 'Initialize FRED on CPU'|
[    0.010132] Initialize FRED on CPU0
[    0.046660] Initialize FRED on CPU0
[    4.024358] Initialize FRED on CPU1
[    4.024358] Initialize FRED on CPU2
[    4.024358] Initialize FRED on CPU3
[    4.024358] Initialize FRED on CPU4
[    4.024358] Initialize FRED on CPU5
[    4.024358] Initialize FRED on CPU6
[    4.024358] Initialize FRED on CPU7
[    4.024358] Initialize FRED on CPU8
[    4.024358] Initialize FRED on CPU9
[    4.024358] Initialize FRED on CPU10
[    4.024358] Initialize FRED on CPU11
[    4.024358] Initialize FRED on CPU12
[    4.024358] Initialize FRED on CPU13
[    4.024358] Initialize FRED on CPU14
[    6.263841] Initialize FRED on CPU15
[    6.263842] Initialize FRED on CPU16
[    6.263842] Initialize FRED on CPU17
[    6.263842] Initialize FRED on CPU18
[    6.263842] Initialize FRED on CPU19
[    6.263843] Initialize FRED on CPU20
[    6.263843] Initialize FRED on CPU21
[    6.263843] Initialize FRED on CPU22
[    6.263843] Initialize FRED on CPU23
[    6.263844] Initialize FRED on CPU24
[    6.263844] Initialize FRED on CPU25
[    6.263844] Initialize FRED on CPU26
[    7.799837] Initialize FRED on CPU27
[    7.799837] Initialize FRED on CPU28
[    7.799838] Initialize FRED on CPU29
[    7.799838] Initialize FRED on CPU30
[    7.799838] Initialize FRED on CPU31
[    7.799838] Initialize FRED on CPU32
[    7.799839] Initialize FRED on CPU33
[    7.799839] Initialize FRED on CPU34
[    7.799839] Initialize FRED on CPU35
[    7.799839] Initialize FRED on CPU36
[    7.799840] Initialize FRED on CPU37
[    7.799840] Initialize FRED on CPU38
[    7.799840] Initialize FRED on CPU39
[    7.799840] Initialize FRED on CPU40
[    7.799841] Initialize FRED on CPU41
[    7.799841] Initialize FRED on CPU42
[    9.783846] Initialize FRED on CPU43
[    9.783846] Initialize FRED on CPU44
[    9.783846] Initialize FRED on CPU45
[    9.783847] Initialize FRED on CPU46
[    9.783847] Initialize FRED on CPU47
[    9.783847] Initialize FRED on CPU48
[    9.783847] Initialize FRED on CPU49
[    9.783848] Initialize FRED on CPU50
[    9.783848] Initialize FRED on CPU51
[    9.783848] Initialize FRED on CPU52
[    9.783849] Initialize FRED on CPU53
[    9.783849] Initialize FRED on CPU54
[    9.783849] Initialize FRED on CPU55
[    9.783849] Initialize FRED on CPU56
[    9.783849] Initialize FRED on CPU57
[   11.831847] Initialize FRED on CPU58
[   11.831847] Initialize FRED on CPU59
[   11.831847] Initialize FRED on CPU60
[   11.831848] Initialize FRED on CPU61
[   11.831848] Initialize FRED on CPU62
[   11.831848] Initialize FRED on CPU63
[   11.831848] Initialize FRED on CPU64
[   11.831849] Initialize FRED on CPU65
[   11.831849] Initialize FRED on CPU66
[   11.831849] Initialize FRED on CPU67
[   11.831849] Initialize FRED on CPU68
[   11.831850] Initialize FRED on CPU69
[   11.831850] Initialize FRED on CPU70
[   11.831850] Initialize FRED on CPU71
[   11.831850] Initialize FRED on CPU72
[   11.831851] Initialize FRED on CPU73
[   13.815850] Initialize FRED on CPU74
[   13.815850] Initialize FRED on CPU75
[   13.815851] Initialize FRED on CPU76
[   13.815851] Initialize FRED on CPU77
[   13.815851] Initialize FRED on CPU78
[   13.815852] Initialize FRED on CPU79
[   13.815852] Initialize FRED on CPU80
[   13.815852] Initialize FRED on CPU81
[   13.815852] Initialize FRED on CPU82
[   13.815853] Initialize FRED on CPU83
[   13.815853] Initialize FRED on CPU84
[   13.815853] Initialize FRED on CPU85
[   13.815853] Initialize FRED on CPU86
[   13.815854] Initialize FRED on CPU87
[   13.815854] Initialize FRED on CPU88
[   13.815854] Initialize FRED on CPU89
[   13.815854] Initialize FRED on CPU90
[   15.799853] Initialize FRED on CPU91
[   15.799853] Initialize FRED on CPU92
[   15.799854] Initialize FRED on CPU93
[   15.799854] Initialize FRED on CPU94
[   15.799854] Initialize FRED on CPU95
[   15.799855] Initialize FRED on CPU96
[   15.799855] Initialize FRED on CPU97
[   15.799855] Initialize FRED on CPU98
[   15.799855] Initialize FRED on CPU99
[   15.799855] Initialize FRED on CPU100
[   15.799856] Initialize FRED on CPU101
[   15.799856] Initialize FRED on CPU102
[   15.799856] Initialize FRED on CPU103
[   15.799856] Initialize FRED on CPU104
[   15.799857] Initialize FRED on CPU105
[   15.799857] Initialize FRED on CPU106
[   15.799857] Initialize FRED on CPU107
[   17.783863] Initialize FRED on CPU108
[   17.783863] Initialize FRED on CPU109
[   17.783864] Initialize FRED on CPU110
[   17.783864] Initialize FRED on CPU111
[   17.783864] Initialize FRED on CPU112
[   17.783865] Initialize FRED on CPU113
[   17.783865] Initialize FRED on CPU114
[   17.783865] Initialize FRED on CPU115
[   17.783865] Initialize FRED on CPU116
[   17.783866] Initialize FRED on CPU117
[   17.783866] Initialize FRED on CPU118
[   17.783866] Initialize FRED on CPU119
[   17.783866] Initialize FRED on CPU120
[   17.783867] Initialize FRED on CPU121
[   17.783867] Initialize FRED on CPU122
[   17.783867] Initialize FRED on CPU123
[   17.783867] Initialize FRED on CPU124
[   17.783868] Initialize FRED on CPU125
[   19.767864] Initialize FRED on CPU126
[   19.767865] Initialize FRED on CPU127
[   19.767865] Initialize FRED on CPU128
[   19.767865] Initialize FRED on CPU129
[   19.767865] Initialize FRED on CPU130
[   19.767865] Initialize FRED on CPU131
[   19.767866] Initialize FRED on CPU132
[   19.767866] Initialize FRED on CPU133
[   19.767866] Initialize FRED on CPU134
[   19.767866] Initialize FRED on CPU135
[   19.767867] Initialize FRED on CPU136
[   19.767867] Initialize FRED on CPU137
[   19.767867] Initialize FRED on CPU138
[   19.767867] Initialize FRED on CPU139
[   19.767868] Initialize FRED on CPU140
[   19.767868] Initialize FRED on CPU141
[   19.767868] Initialize FRED on CPU142
[   19.767868] Initialize FRED on CPU143
[   21.815869] Initialize FRED on CPU144
[   21.815869] Initialize FRED on CPU145
[   21.815870] Initialize FRED on CPU146
[   21.815870] Initialize FRED on CPU147
[   21.815870] Initialize FRED on CPU148
[   21.815870] Initialize FRED on CPU149
[   21.815871] Initialize FRED on CPU150
[   21.815871] Initialize FRED on CPU151
[   21.815871] Initialize FRED on CPU152
[   21.815871] Initialize FRED on CPU153
[   21.815872] Initialize FRED on CPU154
[   21.815872] Initialize FRED on CPU155
[   21.815872] Initialize FRED on CPU156
[   21.815873] Initialize FRED on CPU157
[   21.815873] Initialize FRED on CPU158
[   21.815873] Initialize FRED on CPU159
[   21.815873] Initialize FRED on CPU160
[   21.815874] Initialize FRED on CPU161
[   23.799873] Initialize FRED on CPU162
[   23.799874] Initialize FRED on CPU163
[   23.799874] Initialize FRED on CPU164
[   23.799875] Initialize FRED on CPU165
[   23.799875] Initialize FRED on CPU166
[   23.799875] Initialize FRED on CPU167
[   23.799875] Initialize FRED on CPU168
[   23.799875] Initialize FRED on CPU169
[   23.799876] Initialize FRED on CPU170
[   23.799876] Initialize FRED on CPU171
[   23.799876] Initialize FRED on CPU172
[   23.799877] Initialize FRED on CPU173
[   23.799877] Initialize FRED on CPU174
[   23.799877] Initialize FRED on CPU175
[   23.799877] Initialize FRED on CPU176
[   25.783877] Initialize FRED on CPU177
[   25.783878] Initialize FRED on CPU178
[   25.783878] Initialize FRED on CPU179
[   25.783878] Initialize FRED on CPU180
[   25.783878] Initialize FRED on CPU181
[   25.783879] Initialize FRED on CPU182
[   25.783879] Initialize FRED on CPU183
[   25.783879] Initialize FRED on CPU184
[   25.783879] Initialize FRED on CPU185
[   25.783880] Initialize FRED on CPU186
[   25.783880] Initialize FRED on CPU187
[   25.783880] Initialize FRED on CPU188
[   25.783880] Initialize FRED on CPU189
[   25.783881] Initialize FRED on CPU190
[   25.783881] Initialize FRED on CPU191
[   27.767883] Initialize FRED on CPU192
[   27.767884] Initialize FRED on CPU193
[   27.767884] Initialize FRED on CPU194
[   27.767884] Initialize FRED on CPU195
[   27.767884] Initialize FRED on CPU196
[   27.767885] Initialize FRED on CPU197
[   27.767885] Initialize FRED on CPU198
[   27.767885] Initialize FRED on CPU199
[   27.767885] Initialize FRED on CPU200
[   27.767886] Initialize FRED on CPU201
[   27.767886] Initialize FRED on CPU202
[   27.767886] Initialize FRED on CPU203
[   27.767886] Initialize FRED on CPU204
[   27.767887] Initialize FRED on CPU205
[   27.767887] Initialize FRED on CPU206
[   27.767887] Initialize FRED on CPU207
[   27.767887] Initialize FRED on CPU208
[   29.751890] Initialize FRED on CPU209
[   29.751890] Initialize FRED on CPU210
[   29.751890] Initialize FRED on CPU211
[   29.751890] Initialize FRED on CPU212
[   29.751891] Initialize FRED on CPU213
[   29.751891] Initialize FRED on CPU214
[   29.751891] Initialize FRED on CPU215
[   29.751892] Initialize FRED on CPU216
[   29.751892] Initialize FRED on CPU217
[   29.751892] Initialize FRED on CPU218
[   29.751892] Initialize FRED on CPU219
[   29.751893] Initialize FRED on CPU220
[   29.751893] Initialize FRED on CPU221
[   29.751893] Initialize FRED on CPU222
[   29.751893] Initialize FRED on CPU223
[   29.751894] Initialize FRED on CPU224
[   31.799890] Initialize FRED on CPU225
[   31.799891] Initialize FRED on CPU226
[   31.799891] Initialize FRED on CPU227
[   31.799891] Initialize FRED on CPU228
[   31.799892] Initialize FRED on CPU229
[   31.799892] Initialize FRED on CPU230
[   31.799892] Initialize FRED on CPU231
[   31.799892] Initialize FRED on CPU232
[   31.799892] Initialize FRED on CPU233
[   31.799893] Initialize FRED on CPU234
[   31.799893] Initialize FRED on CPU235
[   31.799893] Initialize FRED on CPU236
[   31.799894] Initialize FRED on CPU237
[   31.799894] Initialize FRED on CPU238
[   31.799894] Initialize FRED on CPU239

$ sudo ./fred_test.sh -t cpuid
|1105_231905.982|TRACE|do_cmd() is called by fred_test.sh:46:cpuid_test()|
|1105_231905.984|TRACE|CMD=cpuid_check 7 0 1 0 a 17|
6 parameters, eax=7
cpuid(eax=00000007, ebx=00000000, ecx=00000001, edx=00000000)
cpuid(&eax=0x7ffe2458d348, &ebx=0x7ffe2458d34c, &ecx=0x7ffe2458d350, &edx=0x7ffe2458d354)
After native_cpuid:
out:  eax=4c9e09d7, ebx=00000001, ecx=00000023,  edx=0184d430
cpuid(&eax=0x7ffe2458d348, &ebx=0x7ffe2458d34c, &ecx=0x7ffe2458d350, &edx=0x7ffe2458d354)
output:
  eax=4c9e09d7    || Binary: 0100 1100 1001 1110 0000 1001 1101 0111
  ebx=00000001    || Binary: 0000 0000 0000 0000 0000 0000 0000 0001
  ecx=00000023    || Binary: 0000 0000 0000 0000 0000 0000 0010 0011
  edx=0184d430    || Binary: 0000 0001 1000 0100 1101 0100 0011 0000
Now check cpuid eax, bit 17
Start with 0, pass: bit set 1, fail: bit set 0
Order bit:14, invert order:17, bit:1, pass!
Done! Return:0.

$ sudo ./fred_test.sh -t cpuinfo
|1105_231913.533|TRACE|do_cmd() is called by fred_test.sh:57:cpuinfo_test()|
|1105_231913.536|TRACE|CMD=cpu_info_check fred|
|1105_231913.553|TRACE|/proc/cpuinfo contain 'fred'|

sdp@a4bf018d80f8:~/work/lkvs/BM/fred$ sudo ./fred_test.sh -t lkgs
|1105_231920.535|TRACE|do_cmd() is called by fred_test.sh:52:lkgs_cpuid_test()|
|1105_231920.538|TRACE|CMD=cpuid_check 7 0 1 0 a 18|
6 parameters, eax=7
cpuid(eax=00000007, ebx=00000000, ecx=00000001, edx=00000000)
cpuid(&eax=0x7ffe4bfc2548, &ebx=0x7ffe4bfc254c, &ecx=0x7ffe4bfc2550, &edx=0x7ffe4bfc2554)
After native_cpuid:
out:  eax=4c9e09d7, ebx=00000001, ecx=00000023,  edx=0184d430
cpuid(&eax=0x7ffe4bfc2548, &ebx=0x7ffe4bfc254c, &ecx=0x7ffe4bfc2550, &edx=0x7ffe4bfc2554)
output:
  eax=4c9e09d7    || Binary: 0100 1100 1001 1110 0000 1001 1101 0111
  ebx=00000001    || Binary: 0000 0000 0000 0000 0000 0000 0000 0001
  ecx=00000023    || Binary: 0000 0000 0000 0000 0000 0000 0010 0011
  edx=0184d430    || Binary: 0000 0001 1000 0100 1101 0100 0011 0000
Now check cpuid eax, bit 18
Start with 0, pass: bit set 1, fail: bit set 0
Order bit:13, invert order:18, bit:1, pass!
Done! Return:0.

Peter Zijlstra and others added 30 commits November 6, 2025 17:41
commit 5fc77b9 upstream.

Create and use EX_TYPE_ZERO_REG to clear the register and retry the
segment load on exception.

Intel-SIG: commit 5fc77b9 x86/segment: Remove .fixup usage.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 6605694 upstream.

Add the CPU feature bit for LKGS (Load "Kernel" GS).

LKGS instruction is introduced with Intel FRED (flexible return and
event delivery) specification. Search for the latest FRED spec in most
search engines with this search pattern:

  site:intel.com FRED (flexible return and event delivery) specification

LKGS behaves like the MOV to GS instruction except that it loads
the base address into the IA32_KERNEL_GS_BASE MSR instead of the
GS segment’s descriptor cache, which is exactly what Linux kernel
does to load a user level GS base.  Thus, with LKGS, there is no
need to SWAPGS away from the kernel GS base.

[ mingo: Minor tweaks to the description. ]

Intel-SIG: commit 6605694 x86/cpufeature: Add the CPU feature bit
for LKGS.
New Intel x86 cpu feature bit for LKGS definition.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/include/asm/cpufeatures.h
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 5a91f12 upstream.

Add the instruction opcode used by LKGS to x86-opcode-map.

Opcode number is per public FRED draft spec v3.0.

Intel-SIG: commit 5a91f12 Add the instruction opcode used by LKGS
to x86-opcode-map.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit df729fb upstream.

Let GCC know that only the low 16 bits of load_gs_index() argument
actually matter. It might allow it to create slightly better
code. However, do not propagate this into the prototypes of functions
that end up being paravirtualized, to avoid unnecessary changes.

Intel-SIG: commit df729fb x86/gsseg: Make asm_load_gs_index()
take an u16.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit ae53fa1 upstream.

GS is a special segment on x86_64, move load_gs_index() to its own new
header file to simplify header inclusion.

No change in functionality.

Intel-SIG: commit ae53fa1 x86/gsseg: Move load_gs_index() to its
own new header file.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/include/asm/gsseg.h
    arch/x86/include/asm/special_insns.h
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 92cbbad upstream.

The LKGS instruction atomically loads a segment descriptor into the
%gs descriptor registers, *except* that %gs.base is unchanged, and the
base is instead loaded into MSR_IA32_KERNEL_GS_BASE, which is exactly
what we want this function to do.

Intel-SIG: commit 92cbbad x86/gsseg: Use the LKGS instruction if
available for load_gs_index().

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Linus Torvalds <[email protected]>
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit e12ad46 upstream.

Module build needs to be able to pick up the C prototype:

  WARNING: modpost: EXPORT symbol "asm_load_gs_index" [vmlinux] version generation failed, symbol will not be versioned.
  Is "asm_load_gs_index" prototyped in <asm/asm-prototypes.h>?

Intel-SIG: commit e12ad46 x86/gsseg: Add the new <asm/gsseg.h>
header to <asm/asm-prototypes.h>.

Fixes: ae53fa1 ("x86/gsseg: Move load_gs_index() to its own new header file")
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: [email protected]

Conflicts:
    arch/x86/include/asm/asm-prototypes.h
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit a4cb5ec upstream.

WRMSRNS is an instruction that behaves exactly like WRMSR, with
the only difference being that it is not a serializing instruction
by default. Under certain conditions, WRMSRNS may replace WRMSR to
improve performance.

Add its CPU feature bit, opcode to the x86 opcode map, and an
always inline API __wrmsrns() to embed WRMSRNS into the code.

Intel-SIG: commit a4cb5ec x86/cpufeatures,opcode,msr: Add the
WRMSRNS instruction support.

Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Shan Kang <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/include/asm/cpufeatures.h
    tools/arch/x86/include/asm/cpufeatures.h
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 3167b37 upstream.

idtentry_sysvec is really just DECLARE_IDTENTRY defined in
<asm/idtentry.h>, no need to define it separately.

Intel-SIG: commit 3167b37 x86/entry: Remove idtentry_sysvec from
entry_{32,64}.S.

Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 8df7193 upstream.

Intel VT-x classifies events into eight different types, which is inherited
by FRED for event identification. As such, event types becomes a common x86
concept, and should be defined in a common x86 header.

Add event type macros to <asm/trapnr.h>, and use them in <asm/vmx.h>.

Intel-SIG: commit 8df7193 x86/trapnr: Add event type macros to
<asm/trapnr.h>.

Suggested-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/include/asm/vmx.h
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 51383e7 upstream.

Briefly introduce FRED, and its advantages compared to IDT.

Intel-SIG: commit 51383e7 Documentation/x86/64: Add documentation
for FRED.

Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Bagas Sanjaya <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    Documentation/arch/
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 2cce959 upstream.

Add the configuration option CONFIG_X86_FRED to enable FRED.

Intel-SIG: commit 2cce959 x86/fred: Add Kconfig option for FRED
(CONFIG_X86_FRED).

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 51c158f upstream.

Any FRED enabled CPU will always have the following features as its
baseline:

  1) LKGS, load attributes of the GS segment but the base address into
     the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor
     cache.

  2) WRMSRNS, non-serializing WRMSR for faster MSR writes.

Intel-SIG: commit 51c158f x86/cpufeatures: Add the CPU feature
bit for FRED.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/include/asm/cpufeatures.h
    arch/x86/kernel/cpu/cpuid-deps.c
    tools/arch/x86/include/asm/cpufeatures.h
[ Xingjiang Lu: resolved context conflict ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit e554a8c upstream.

Add CONFIG_X86_FRED to <asm/disabled-features.h> to make
cpu_feature_enabled() work correctly with FRED.

Intel-SIG: commit e554a8c x86/fred: Disable FRED support if
CONFIG_X86_FRED is disabled.

Originally-by: Megha Dey <[email protected]>
Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/include/asm/disabled-features.h
    tools/arch/x86/include/asm/disabled-features.h
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 3810da1 upstream.

Let command line option "fred" accept multiple options to make it
easier to tweak its behavior.

Currently, two options 'on' and 'off' are allowed, and the default
behavior is to disable FRED. To enable FRED, append "fred=on" to the
kernel command line.

  [ bp: Use cpu_feature_enabled(), touch ups. ]

Intel-SIG: commit 3810da1 x86/fred: Add a fred= cmdline param.

Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 0115f8b upstream.

ERETU returns from an event handler while making a transition to ring 3,
and ERETS returns from an event handler while staying in ring 0.

Add instruction opcodes used by ERET[US] to the x86 opcode map; opcode
numbers are per FRED spec v5.0.

Intel-SIG: commit 0115f8b x86/opcode: Add ERET[US] instructions
to the x86 opcode map.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit cd19bab upstream.

Update the objtool decoder to know about the ERET[US] instructions
(type INSN_CONTEXT_SWITCH).

Intel-SIG: commit cd19bab x86/objtool: Teach objtool about
ERET[US].

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    tools/objtool/arch/x86/decode.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit ff45746 upstream.

Add X86_CR4_FRED macro for the FRED bit in %cr4. This bit must not be
changed after initialization, so add it to the pinned CR4 bits.

Intel-SIG: commit ff45746 x86/cpu: Add X86_CR4_FRED macro.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/include/uapi/asm/processor-flags.h
    arch/x86/kernel/cpu/common.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit cd6df3f upstream.

Add MSR numbers for the FRED configuration registers per FRED spec 5.0.

Intel-SIG: commit cd6df3f x86/cpu: Add MSR numbers for FRED
configuration.

Originally-by: Megha Dey <[email protected]>
Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/include/asm/msr-index.h
    tools/arch/x86/include/asm/msr-index.h
[Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit ee63291 upstream.

struct pt_regs is hard to read because the member or section related
comments are not aligned with the members.

The 'cs' and 'ss' members of pt_regs are type of 'unsigned long' while
in reality they are only 16-bit wide. This works so far as the
remaining space is unused, but FRED will use the remaining bits for
other purposes.

To prepare for FRED:

  - Cleanup the formatting
  - Convert 'cs' and 'ss' to u16 and embed them into an union
    with a u64
  - Fixup the related printk() format strings

Intel-SIG: commit ee63291 x86/ptrace: Cleanup the definition of
the pt_regs structure.

Suggested-by: Thomas Gleixner <[email protected]>
Originally-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 3c77bf0 upstream.

FRED defines additional information in the upper 48 bits of cs/ss
fields. Therefore add the information definitions into the pt_regs
structure.

Specifically introduce a new structure fred_ss to denote the FRED flags
above SS selector, which avoids FRED_SSX_ macros and makes the code
simpler and easier to read.

Intel-SIG: commit 3c77bf0 x86/ptrace: Add FRED additional
information to the pt_regs structure.

Suggested-by: Thomas Gleixner <[email protected]>
Originally-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 32b09c2 upstream.

Add a header file for FRED prototypes and definitions.

Intel-SIG: commit 32b09c2 x86/fred: Add a new header file for
FRED definitions.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 65c9cc9 upstream.

When using FRED, reserve space at the top of the stack frame, just
like i386 does.

Intel-SIG: commit 65c9cc9 x86/fred: Reserve space for the FRED
stack frame.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 9356c4b upstream.

MSR_IA32_FRED_RSP0 is used during ring 3 event delivery, and needs to
be updated to point to the top of next task stack during task switch.

Intel-SIG: commit 9356c4b x86/fred: Update MSR_IA32_FRED_RSP0
during task switch.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/include/asm/switch_to.h
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 09794f6 upstream.

SWAPGS is no longer needed thus NOT allowed with FRED because FRED
transitions ensure that an operating system can _always_ operate
with its own GS base address:

  - For events that occur in ring 3, FRED event delivery swaps the GS
    base address with the IA32_KERNEL_GS_BASE MSR.

  - ERETU (the FRED transition that returns to ring 3) also swaps the
    GS base address with the IA32_KERNEL_GS_BASE MSR.

And the operating system can still setup the GS segment for a user
thread without the need of loading a user thread GS with:

  - Using LKGS, available with FRED, to modify other attributes of the
    GS segment without compromising its ability always to operate with
    its own GS base address.

  - Accessing the GS segment base address for a user thread as before
    using RDMSR or WRMSR on the IA32_KERNEL_GS_BASE MSR.

Note, LKGS loads the GS base address into the IA32_KERNEL_GS_BASE MSR
instead of the GS segment's descriptor cache. As such, the operating
system never changes its runtime GS base address.

Intel-SIG: commit 09794f6 x86/fred: Disallow the swapgs
instruction when FRED is enabled.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/kernel/process_64.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit df88387 upstream.

Because FRED always restores the full value of %rsp, ESPFIX is
no longer needed when it's enabled.

Intel-SIG: commit df88387 x86/fred: No ESPFIX needed when FRED is
enabled.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit ad41a14 upstream.

Entering a new task is logically speaking a return from a system call
(exec, fork, clone, etc.). As such, if ptrace enables single stepping
a single step exception should be allowed to trigger immediately upon
entering user space. This is not optional.

NMI should *never* be disabled in user space. As such, this is an
optional, opportunistic way to catch errors.

Allow single-step trap and NMI when starting a new task, thus once
the new task enters user space, single-step trap and NMI are both
enabled immediately.

Intel-SIG: commit ad41a14 x86/fred: Allow single-step trap and
NMI when starting a new task.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 58c80cc upstream.

On a FRED system, the faulting address (CR2) is passed on the stack,
to avoid the problem of transient state.  Thus the page fault address
is read from the FRED stack frame instead of CR2 when FRED is enabled.

Intel-SIG: commit 58c80cc x86/fred: Make exc_page_fault() work
for FRED.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 90f3572 upstream.

FRED and IDT can share most of the definitions and declarations so
that in the majority of cases the actual handler implementation is the
same.

The differences are the exceptions where FRED stores exception related
information on the stack and the sysvec implementations as FRED can
handle irqentry/exit() in the dispatcher instead of having it in each
handler.

Also add stub defines for vectors which are not used due to Kconfig
decisions to spare the ifdeffery in the actual FRED dispatch code.

Intel-SIG: commit 90f3572 x86/idtentry: Incorporate definitions/
declarations of the FRED entries.

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 99fcc96 upstream.

When occurred on different ring level, i.e., from user or kernel context,
stack, while kernel #DB on a dedicated stack. This is exactly how FRED
event delivery invokes an exception handler: ring 3 event on level 0
stack, i.e., current task stack; ring 0 event on the #DB dedicated stack
specified in the IA32_FRED_STKLVLS MSR. So unlike IDT, the FRED debug
exception entry stub doesn't do stack switch.

On a FRED system, the debug trap status information (DR6) is passed on
the stack, to avoid the problem of transient state. Furthermore, FRED
transitions avoid a lot of ugly corner cases the handling of which can,
and should be, skipped.

The FRED debug trap status information saved on the stack differs from
DR6 in both stickiness and polarity; it is exactly in the format which
debug_read_clear_dr6() returns for the IDT entry points.

Intel-SIG: commit 99fcc96 x86/fred: Add a debug fault entry stub
for FRED.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
sean-jc and others added 30 commits November 6, 2025 17:41
commit 4f76e86 upstream.

Split the asm subroutines for handling NMIs versus IRQs that occur in the
guest so that the NMI handler can be called from a noinstr section.  As a
bonus, the NMI path doesn't need an indirect branch.

Intel-SIG: commit 4f76e86 KVM: VMX: Provide separate subroutines
for invoking NMI vs. IRQ handlers.

Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>

Conflicts:
    arch/x86/kvm/vmx/vmenter.S
    arch/x86/kvm/vmx/vmx.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 70d0fe5 upstream.

When FRED is enabled, call fred_entry_from_kvm() to handle IRQ/NMI in
IRQ/NMI induced VM exits.

Intel-SIG: commit 70d0fe5 KVM: VMX: Call fred_entry_from_kvm() for
IRQ/NMI handling.

Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/kvm/vmx/vmx.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 530dce2 upstream.

Because FRED uses the ring 3 FRED entrypoint for SYSCALL and SYSENTER and
ERETU is the only legit instruction to return to ring 3, there is NO need
to setup SYSCALL and SYSENTER MSRs for FRED, except the IA32_STAR MSR.

Split IDT syscall setup code into idt_syscall_init() to make it easy to
skip syscall setup code when FRED is enabled.

Intel-SIG: commit 530dce2 x86/syscall: Split IDT syscall setup code
into idt_syscall_init().

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/kernel/cpu/common.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit cdd99dd upstream.

Add cpu_init_fred_exceptions() to:
  - Set FRED entrypoints for events happening in ring 0 and 3.
  - Specify the stack level for IRQs occurred ring 0.
  - Specify dedicated event stacks for #DB/NMI/#MCE/#DF.
  - Enable FRED and invalidtes IDT.
  - Force 32-bit system calls to use "int $0x80" only.

Add fred_complete_exception_setup() to:
  - Initialize system_vectors as done for IDT systems.
  - Set unused sysvec_table entries to fred_handle_spurious_interrupt().

Intel-SIG: commit cdd99dd x86/fred: Add FRED initialization
functions.

Co-developed-by: Xin Li <[email protected]>
Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/kernel/Makefile
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 208d8c7 upstream.

Let cpu_init_exception_handling() call cpu_init_fred_exceptions() to
initialize FRED. However if FRED is unavailable or disabled, it falls
back to set up TSS IST and initialize IDT.

Intel-SIG: commit 208d8c7 x86/fred: Invoke FRED initialization code
to enable FRED.

Co-developed-by: Xin Li <[email protected]>
Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/kernel/cpu/common.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
…ng to inline properly

commit cba9ff3 upstream.

Change array_index_mask_nospec() to __always_inline because "inline" is
broken as https://www.kernel.org/doc/local/inline.html.

Intel-SIG: commit cba9ff3 x86/fred: Fix a build warning with
allmodconfig due to 'inline' failing to inline properly.

Fixes: 6786137bf8fd ("x86/fred: FRED entry/exit and dispatch code")
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit e138419 upstream.

Add H. Peter Anvin and myself as FRED maintainers.

Intel-SIG: commit e138419 MAINTAINERS: Add a maintainer entry for
FRED.

Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: H. Peter Anvin (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit e57ef2e upstream.

The layout of per-cpu variables is at the mercy of the compiler. This
can lead to random performance fluctuations from build to build.

Create a structure to hold some of the hottest per-cpu variables,
starting with current_task.

Intel-SIG: commit e57ef2e x86: Put hot per CPU variables into a
struct.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/kernel/process_32.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 96e8fc5 upstream.

Instead of clearing the bss area in assembly code, use the clear_bss()
function.

This requires to pass the start_info address as parameter to
xen_start_kernel() in order to avoid the xen_start_info being zeroed
again.

Intel-SIG: commit 96e8fc5 x86/xen: Use clear_bss() for Xen PV
guests.

Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/xen/enlighten_pv.c
    arch/x86/xen/xen-head.S
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit e94cd15 upstream.

The synchronization of the AP with the control CPU is a SMP boot problem
and has nothing to do with cpu_init().

Open code cpu_init_secondary() in start_secondary() and move
wait_for_master_cpu() into the SMP boot code.

No functional change.

Intel-SIG: commit e94cd15 x86/smpboot: Get rid of
cpu_init_secondary().

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/include/asm/processor.h
    arch/x86/kernel/cpu/common.c
    arch/x86/kernel/smpboot.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 3adee77 upstream.

In order to facilitate parallel startup, start to eliminate some of the
global variables passing information to CPUs in the startup path.

However, start by introducing one more: smpboot_control. For now this
merely holds the CPU# of the CPU which is coming up. Each CPU can then
find its own per-cpu data, and everything else it needs can be found
from there, allowing the other global variables to be removed.

First to be removed is initial_stack. Each CPU can load %rsp from its
current_task->thread.sp instead. That is already set up with the correct
idle thread for APs. Set up the .sp field in INIT_THREAD on x86 so that
the BSP also finds a suitable stack pointer in the static per-cpu data
when coming up on first boot.

On resume from S3, the CPU needs a temporary stack because its idle task
is already active. Instead of setting initial_stack, the sleep code can
simply set its own current->thread.sp to point to the temporary stack.
Nobody else cares about ->thread.sp for a thread which is currently on
a CPU, because the true value is actually in the %rsp register. Which
is restored with the rest of the CPU context in do_suspend_lowlevel().

Intel-SIG: commit 3adee77 x86/smpboot: Remove initial_stack on
64-bit.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Usama Arif <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Usama Arif <[email protected]>
Tested-by: Guilherme G. Piccoli <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/kernel/asm-offsets.c
    arch/x86/kernel/head_64.S
    arch/x86/xen/xen-head.S
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit c416b5b upstream.

As TOP_OF_KERNEL_STACK_PADDING was defined as 0 on x86_64, it went
unnoticed that the initialization of the .sp field in INIT_THREAD and some
calculations in the low level startup code do not take the padding into
account.

FRED enabled kernels require a 16 byte padding, which means that the init
task initialization and the low level startup code use the wrong stack
offset.

Subtract TOP_OF_KERNEL_STACK_PADDING in all affected places to adjust for
this.

Intel-SIG: commit c416b5b x86/fred: Fix init_task thread stack
pointer initialization.

Fixes: 65c9cc9 ("x86/fred: Reserve space for the FRED stack frame")
Fixes: 3adee77 ("x86/smpboot: Remove initial_stack on 64-bit")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/kernel/head_64.S
    arch/x86/xen/xen-head.S
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 989b5cf upstream.

Depending on whether FRED is enabled, sysvec_install() installs a system
interrupt handler into either into FRED's system vector dispatch table or
into the IDT.

However FRED can be disabled later in trap_init(), after sysvec_install()
has been invoked already; e.g., the HYPERVISOR_CALLBACK_VECTOR handler is
registered with sysvec_install() in kvm_guest_init(), which is called in
setup_arch() but way before trap_init().

IOW, there is a gap between FRED is available and available but disabled.
As a result, when FRED is available but disabled, early sysvec_install()
invocations fail to install the IDT handler resulting in spurious
interrupts.

Fix it by parsing cmdline param "fred=" in cpu_parse_early_param() to
ensure that FRED is disabled before the first sysvec_install() incovations.

Intel-SIG: commit 989b5cf x86/fred: Parse cmdline param "fred=" in
cpu_parse_early_param().

Fixes: 3810da1 ("x86/fred: Add a fred= cmdline param")
Reported-by: Hou Wenlong <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

Conflicts:
    arch/x86/kernel/cpu/common.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 73270c1 upstream.

To enable FRED earlier, move the RSP initialization out of
cpu_init_fred_exceptions() into cpu_init_fred_rsps().

This is required as the FRED RSP initialization depends on the availability
of the CPU entry areas which are set up late in trap_init(),

No functional change intended. Marked with Fixes as it's a depedency for
the real fix.

Intel-SIG: commit 73270c1 x86/fred: Move FRED RSP initialization
into a separate function.

Fixes: 14619d9 ("x86/fred: FRED entry/exit and dispatch code")
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

Conflicts:
    arch/x86/kernel/cpu/common.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit a97756c upstream.

On 64-bit init_mem_mapping() relies on the minimal page fault handler
provided by the early IDT mechanism. The real page fault handler is
installed right afterwards into the IDT.

This is problematic on CPUs which have X86_FEATURE_FRED set because the
real page fault handler retrieves the faulting address from the FRED
exception stack frame and not from CR2, but that does obviously not work
when FRED is not yet enabled in the CPU.

To prevent this enable FRED right after init_mem_mapping() without
interrupt stacks. Those are enabled later in trap_init() after the CPU
entry area is set up.

[ tglx: Encapsulate the FRED details ]

Intel-SIG: commit a97756c x86/fred: Enable FRED right after
init_mem_mapping().

Fixes: 14619d9 ("x86/fred: FRED entry/exit and dispatch code")
Reported-by: Hou Wenlong <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

Conflicts:
    arch/x86/include/asm/processor.h
    arch/x86/kernel/cpu/common.c
    arch/x86/kernel/smpboot.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 723edbd upstream.

SS is initialized to NULL during boot time and not explicitly set to
__KERNEL_DS.

With FRED enabled, if a kernel event is delivered before a CPU goes to
user level for the first time, its SS is NULL thus NULL is pushed into
the SS field of the FRED stack frame.  But before ERETS is executed,
the CPU may context switch to another task and go to user level.  Then
when the CPU comes back to kernel mode, SS is changed to __KERNEL_DS.
Later when ERETS is executed to return from the kernel event handler,
a #GP fault is generated because SS doesn't match the SS saved in the
FRED stack frame.

Initialize SS to __KERNEL_DS when enabling FRED to prevent that.

Note, IRET doesn't check if SS matches the SS saved in its stack frame,
thus IDT doesn't have this problem.  For IDT it doesn't matter whether
SS is set to __KERNEL_DS or not, because it's set to NULL upon interrupt
or exception delivery and __KERNEL_DS upon SYSCALL.  Thus it's pointless
to initialize SS for IDT.

Intel-SIG: commit 723edbd x86/fred: Set SS to __KERNEL_DS when
enabling FRED.

Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 0dfac6f upstream.

In most cases, ti_work values passed to arch_exit_to_user_mode_prepare()
are zeros, e.g., 99% in kernel build tests.  So an obvious optimization is
to test ti_work for zero before processing individual bits in it.

Omit the optimization when FPU debugging is enabled, otherwise the
FPU consistency check is never executed.

Intel 0day tests did not find a perfermance regression with this change.

Intel-SIG: commit 0dfac6f x86/entry: Test ti_work for zero before
processing individual bits.

Suggested-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
…nism

commit efe5088 upstream.

Per the discussion about FRED MSR writes with WRMSRNS instruction [1],
use the alternatives mechanism to choose WRMSRNS when it's available,
otherwise fallback to WRMSR.

Remove the dependency on X86_FEATURE_WRMSRNS as WRMSRNS is no longer
dependent on FRED.

[1] https://lore.kernel.org/lkml/[email protected]/

Use DS prefix to pad WRMSR instead of a NOP. The prefix is ignored. At
least that's the current information from the hardware folks.

Intel-SIG: commit efe5088 x86/msr: Switch between WRMSRNS and WRMSR
with the alternatives mechanism.

Signed-off-by: Andrew Cooper <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

Conflicts:
    arch/x86/include/asm/switch_to.h
    arch/x86/kernel/cpu/cpuid-deps.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
…itch

commit fe85ee3 upstream.

The FRED RSP0 MSR points to the top of the kernel stack for user level
event delivery. As this is the task stack it needs to be updated when a
task is scheduled in.

The update is done at context switch. That means it's also done when
switching to kernel threads, which is pointless as those never go out to
user space. For KVM threads this means there are two writes to FRED_RSP0 as
KVM has to switch to the guest value before VMENTER.

Defer the update to the exit to user space path and cache the per CPU
FRED_RSP0 value, so redundant writes can be avoided.

Provide fred_sync_rsp0() for KVM to keep the cache in sync with the actual
MSR value after returning from guest to host mode.

[ tglx: Massage change log ]

Intel-SIG: commit fe85ee3 x86/entry: Set FRED RSP0 on return to
userspace instead of context switch.

Suggested-by: Sean Christopherson <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

Conflicts:
    arch/x86/include/asm/switch_to.h
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 991625f upstream.

The bits required to make the hardware go.. Of note is that, provided
the syscall entry points are covered with ENDBR, #CP doesn't need to
be an IST because we'll never hit the syscall gap.

Intel-SIG: commit 991625f x86/ibt: Add IBT feature, MSR and #CP
handling.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/include/asm/cpufeatures.h
    arch/x86/include/uapi/asm/processor-flags.h
    arch/x86/kernel/cpu/common.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 18e66b6 upstream.

Shadow stack provides protection for applications against function return
address corruption. It is active when the processor supports it, the
kernel has CONFIG_X86_SHADOW_STACK enabled, and the application is built
for the feature. This is only implemented for the 64-bit kernel. When it
is enabled, legacy non-shadow stack applications continue to work, but
without protection.

Since there is another feature that utilizes CET (Kernel IBT) that will
share implementation with shadow stacks, create CONFIG_CET to signify
that at least one CET feature is configured.

Intel-SIG: commit 18e66b6 x86/shstk: Add Kconfig option for shadow
stack.

Co-developed-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Tested-by: Pengfei Xu <[email protected]>
Tested-by: John Allen <[email protected]>
Tested-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/all/20230613001108.3040476-7-rick.p.edgecombe%40intel.com

Conflicts:
    arch/x86/Kconfig.assembler
    arch/x86/Kconfig
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 2da5b91 upstream.

Today the control protection handler is defined in traps.c and used only
for the kernel IBT feature. To reduce ifdeffery, move it to it's own file.
In future patches, functionality will be added to make this handler also
handle user shadow stack faults. So name the file cet.c.

No functional change.

Intel-SIG: commit 2da5b91 x86/traps: Move control protection
handler to separate file.

Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Tested-by: Pengfei Xu <[email protected]>
Tested-by: John Allen <[email protected]>
Tested-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/all/20230613001108.3040476-8-rick.p.edgecombe%40intel.com

Conflicts:
    arch/x86/kernel/Makefile
    arch/x86/kernel/cet.c
    arch/x86/kernel/traps.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 5b2fc51 upstream.

Even though Xen currently doesn't advertise IBT, prepare for when it
will eventually do so and sprinkle the ENDBR dust accordingly.

Even though most of the entry points are IRET like, the CPL0
Hypervisor can set WAIT-FOR-ENDBR and demand ENDBR at these sites.

Intel-SIG: commit 5b2fc51 x86/ibt,xen: Sprinkle the ENDBR.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Conflicts:
    arch/x86/entry/entry_64.S
    arch/x86/kernel/head_64.S
    arch/x86/xen/xen-asm.S
    arch/x86/xen/xen-head.S
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit 701fb66 upstream.

The Control-Flow Enforcement Technology contains two related features,
one of which is Shadow Stacks. Future patches will utilize this feature
for shadow stack support in KVM, so add a CPU feature flags for Shadow
Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]).

To protect shadow stack state from malicious modification, the registers
are only accessible in supervisor mode. This implementation
context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend
on XSAVES.

The shadow stack feature, enumerated by the CPUID bit described above,
encompasses both supervisor and userspace support for shadow stack. In
near future patches, only userspace shadow stack will be enabled. In
expectation of future supervisor shadow stack support, create a software
CPU capability to enumerate kernel utilization of userspace shadow stack
support. This user shadow stack bit should depend on the HW "shstk"
capability and that logic will be implemented in future patches.

Intel-SIG: commit 701fb66 x86/cpufeatures: Add CPU feature flags
for shadow stacks.

Co-developed-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Tested-by: Pengfei Xu <[email protected]>
Tested-by: John Allen <[email protected]>
Tested-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/all/20230613001108.3040476-9-rick.p.edgecombe%40intel.com

Conflicts:
    arch/x86/include/asm/cpufeatures.h
    arch/x86/include/asm/disabled-features.h
    arch/x86/kernel/cpu/cpuid-deps.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit a5f6c2a upstream.

A control-protection fault is triggered when a control-flow transfer
attempt violates Shadow Stack or Indirect Branch Tracking constraints.
For example, the return address for a RET instruction differs from the copy
on the shadow stack.

There already exists a control-protection fault handler for handling kernel
IBT faults. Refactor this fault handler into separate user and kernel
handlers, like the page fault handler. Add a control-protection handler
for usermode. To avoid ifdeffery, put them both in a new file cet.c, which
is compiled in the case of either of the two CET features supported in the
kernel: kernel IBT or user mode shadow stack. Move some static inline
functions from traps.c into a header so they can be used in cet.c.

Opportunistically fix a comment in the kernel IBT part of the fault
handler that is on the end of the line instead of preceding it.

Keep the same behavior for the kernel side of the fault handler, except for
converting a BUG to a WARN in the case of a #CP happening when the feature
is missing. This unifies the behavior with the new shadow stack code, and
also prevents the kernel from crashing under this situation which is
potentially recoverable.

The control-protection fault handler works in a similar way as the general
protection fault handler. It provides the si_code SEGV_CPERR to the signal
handler.

Intel-SIG: commit a5f6c2a x86/shstk: Add user control-protection
fault handler.

Co-developed-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Tested-by: Pengfei Xu <[email protected]>
Tested-by: John Allen <[email protected]>
Tested-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/all/20230613001108.3040476-28-rick.p.edgecombe%40intel.com

Conflicts:
    arch/x86/include/asm/disabled-features.h
    arch/x86/kernel/signal_compat.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit c6cfcbd upstream.

The following warning is reported when frame pointers and kernel IBT are
enabled:

  vmlinux.o: warning: objtool: ibt_selftest+0x11: sibling call from callable instruction with modified stack frame

The problem is that objtool interprets the indirect branch in
ibt_selftest() as a sibling call, and GCC inserts a (partial) frame
pointer prologue before it:

  0000 000000000003f550 <ibt_selftest>:
  0000    3f550:	f3 0f 1e fa          	endbr64
  0004    3f554:	e8 00 00 00 00       	call   3f559 <ibt_selftest+0x9>	3f555: R_X86_64_PLT32	__fentry__-0x4
  0009    3f559:	55                   	push   %rbp
  000a    3f55a:	48 8d 05 02 00 00 00 	lea    0x2(%rip),%rax        # 3f563 <ibt_selftest_ip>
  0011    3f561:	ff e0                	jmp    *%rax

Note the inline asm is missing ASM_CALL_CONSTRAINT, so the 'push %rbp'
happens before the indirect branch and the 'mov %rsp, %rbp' happens
afterwards.

Simplify the generated code and make it easier to understand for both
tools and humans by moving the selftest to proper asm.

Intel-SIG: commit c6cfcbd x86/ibt: Convert IBT selftest to asm.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/99a7e16b97bda97bf0a04aa141d6241cd8a839a2.1680912949.git.jpoimboe@kernel.org

Conflicts:
    arch/x86/kernel/Makefile
    arch/x86/kernel/cpu/common.c
    arch/x86/kernel/ibt_selftest.S
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit dc81e556f2a017d681251ace21bf06c126d5a192 upstream.

An indirect branch instruction sets the CPU indirect branch tracker
(IBT) into WAIT_FOR_ENDBRANCH (WFE) state and WFE stays asserted
across the instruction boundary.  When the decoder finds an
inappropriate instruction while WFE is set ENDBR, the CPU raises a #CP
fault.

For the "kernel IBT no ENDBR" selftest where #CPs are deliberately
triggered, the WFE state of the interrupted context needs to be
cleared to let execution continue.  Otherwise when the CPU resumes
from the instruction that just caused the previous #CP, another
missing-ENDBRANCH #CP is raised and the CPU enters a dead loop.

This is not a problem with IDT because it doesn't preserve WFE and
IRET doesn't set WFE.  But FRED provides space on the entry stack
(in an expanded CS area) to save and restore the WFE state, thus the
WFE state is no longer clobbered, so software must clear it.

Clear WFE to avoid dead looping in ibt_clear_fred_wfe() and the
!ibt_fatal code path when execution is allowed to continue.

Clobbering WFE in any other circumstance is a security-relevant bug.

[ dhansen: changelog rewording ]

Intel-SIG: commit dc81e556f2a0 x86/fred: Clear WFE in missing-ENDBRANCH

Fixes: a5f6c2a ("x86/shstk: Add user control-protection fault handler")
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/all/20241113175934.3897541-1-xin%40zytor.com
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit de31b3cd706347044e1a57d68c3a683d58e8cca4 upstream.

The FRED RSP0 MSR is only used for delivering events when running
userspace.  Linux leverages this property to reduce expensive MSR
writes and optimize context switches.  The kernel only writes the
MSR when about to run userspace *and* when the MSR has actually
changed since the last time userspace ran.

This optimization is implemented by maintaining a per-CPU cache of
FRED RSP0 and then checking that against the value for the top of
current task stack before running userspace.

However cpu_init_fred_exceptions() writes the MSR without updating
the per-CPU cache.  This means that the kernel might return to
userspace with MSR_IA32_FRED_RSP0==0 when it needed to point to the
top of current task stack.  This would induce a double fault (#DF),
which is bad.

A context switch after cpu_init_fred_exceptions() can paper over
the issue since it updates the cached value.  That evidently
happens most of the time explaining how this bug got through.

Fix the bug through resynchronizing the FRED RSP0 MSR with its
per-CPU cache in cpu_init_fred_exceptions().

Intel-SIG: commit de31b3cd7063 x86/fred: Fix the FRED RSP0 MSR out of
sync with its per-CPU cache.

Fixes: fe85ee3 ("x86/entry: Set FRED RSP0 on return to userspace instead of context switch")
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Cc:[email protected]
Link: https://lore.kernel.org/all/20250110174639.1250829-1-xin%40zytor.com
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
commit e5f1e8af9c9e151ecd665f6d2e36fb25fec3b110 upstream.

Upon a wakeup from S4, the restore kernel starts and initializes the
FRED MSRs as needed from its perspective.  It then loads a hibernation
image, including the image kernel, and attempts to load image pages
directly into their original page frames used before hibernation unless
those frames are currently in use.  Once all pages are moved to their
original locations, it jumps to a "trampoline" page in the image kernel.

At this point, the image kernel takes control, but the FRED MSRs still
contain values set by the restore kernel, which may differ from those
set by the image kernel before hibernation.  Therefore, the image kernel
must ensure the FRED MSRs have the same values as before hibernation.
Since these values depend only on the location of the kernel text and
data, they can be recomputed from scratch.

Intel-SIG: commit e5f1e8af9c9e x86/fred: Fix system hang during S4
resume with FRED enabled.

Reported-by: Xi Pardee <[email protected]>
Reported-by: Todd Brandt <[email protected]>
Tested-by: Todd Brandt <[email protected]>
Suggested-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: H. Peter Anvin (Intel) <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Xingjiang Lu: amend commit log ]
Signed-off-by: Lu, Xingjiang <[email protected]>
…rn from SIGTRAP handler

commit e34dbbc85d64af59176fe59fad7b4122f4330fe2 upstream.

Clear the software event flag in the augmented SS to prevent immediate
repeat of single step trap on return from SIGTRAP handler if the trap
flag (TF) is set without an external debugger attached.

Following is a typical single-stepping flow for a user process:

1) The user process is prepared for single-stepping by setting
   RFLAGS.TF = 1.
2) When any instruction in user space completes, a #DB is triggered.
3) The kernel handles the #DB and returns to user space, invoking the
   SIGTRAP handler with RFLAGS.TF = 0.
4) After the SIGTRAP handler finishes, the user process performs a
   sigreturn syscall, restoring the original state, including
   RFLAGS.TF = 1.
5) Goto step 2.

According to the FRED specification:

A) Bit 17 in the augmented SS is designated as the software event
   flag, which is set to 1 for FRED event delivery of SYSCALL,
   SYSENTER, or INT n.
B) If bit 17 of the augmented SS is 1 and ERETU would result in
   RFLAGS.TF = 1, a single-step trap will be pending upon completion
   of ERETU.

In step 4) above, the software event flag is set upon the sigreturn
syscall, and its corresponding ERETU would restore RFLAGS.TF = 1.
This combination causes a pending single-step trap upon completion of
ERETU.  Therefore, another #DB is triggered before any user space
instruction is executed, which leads to an infinite loop in which the
SIGTRAP handler keeps being invoked on the same user space IP.

Intel-SIG: commit e34dbbc85d64 x86/fred/signal: Prevent immediate repeat
of single step trap on return from SIGTRAP handler.

Fixes: 14619d9 ("x86/fred: FRED entry/exit and dispatch code")
Suggested-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Cc:[email protected]
Link: https://lore.kernel.org/all/20250609084054.2083189-2-xin%40zytor.com

Conflicts:
    arch/x86/include/asm/sighandling.h
    arch/x86/kernel/signal.c
[ Xingjiang Lu: resolve context conflicts ]
Signed-off-by: Lu, Xingjiang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.