Skip to content

bpf: Mitigate Spectre v1 using barriers #8817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: bpf-next_base
Choose a base branch
from

Conversation

kernel-patches-daemon-bpf[bot]
Copy link

Pull request for series with
subject: bpf: Mitigate Spectre v1 using barriers
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=955213

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 8582d9a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=955213
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 5709be4
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=955213
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: be2fea9
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=955213
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 5cffad0
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=955213
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 53ebef5
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=955213
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 6aca583
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=955213
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: b9c09fb
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=955213
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 60400cd
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=955213
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 4cc2048
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=955213
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 46eb012
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=955213
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 91dbac4
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=955213
version: 2

Currently, __xlated_unpriv and __jited_unpriv do not work because the
BPF syscall will overwrite info.jited_prog_len and info.xlated_prog_len
with 0 if the process is not bpf_capable(). This bug was not noticed
before, because there is no test that actually uses
__xlated_unpriv/__jited_unpriv.

To resolve this, simply restore the capabilities earlier (but still
after loading the program). Adding this here unconditionally is fine
because the function first checks that the capabilities were initialized
before attempting to restore them.

This will be important later when we add tests that check whether a
speculation barrier was inserted in the correct location.

Signed-off-by: Luis Gerhorst <[email protected]>
Fixes: 9c9f733 ("selftests/bpf: allow checking xlated programs in verifier_* tests")
Fixes: 7d743e4 ("selftests/bpf: __jited test tag to check disassembly after jit")
This is required to catch the errors later and fall back to a nospec if
on a speculative path.

Eliminate the regs variable as it is only used once and insn_idx is not
modified in-between the definition and usage.

Still pass insn simply to match the other check_*() functions. As Eduard
points out [1], insn is assumed to correspond to env->insn_idx in many
places (e.g, __check_reg_arg()).

Move code into do_check_insn(), replace
* "continue" with "return 0" after modifying insn_idx
* "goto process_bpf_exit" with "return PROCESS_BPF_EXIT"
* "do_print_state = " with "*do_print_state = "

[1] https://lore.kernel.org/all/[email protected]/

Signed-off-by: Luis Gerhorst <[email protected]>
Acked-by: Henriette Herzog <[email protected]>
Cc: Maximilian Ott <[email protected]>
Cc: Milan Stephan <[email protected]>
Mark these cases as non-recoverable to later prevent them from being
caught when they occur during speculative path verification.

Eduard writes [1]:

  The only pace I'm aware of that might act upon specific error code
  from verifier syscall is libbpf. Looking through libbpf code, it seems
  that this change does not interfere with libbpf.

[1] https://lore.kernel.org/all/[email protected]/

Signed-off-by: Luis Gerhorst <[email protected]>
Reviewed-by: Eduard Zingerman <[email protected]>
Acked-by: Henriette Herzog <[email protected]>
Cc: Maximilian Ott <[email protected]>
Cc: Milan Stephan <[email protected]>
This prevents us from trying to recover from these on speculative paths
in the future.

Signed-off-by: Luis Gerhorst <[email protected]>
Reviewed-by: Eduard Zingerman <[email protected]>
Acked-by: Henriette Herzog <[email protected]>
Cc: Maximilian Ott <[email protected]>
Cc: Milan Stephan <[email protected]>
JITs can set bpf_jit_bypass_spec_v1/v4() if they want the verifier to
skip analysis/patching for the respective vulnerability. For v4, this
will reduce the number of barriers the verifier inserts. For v1, it
allows more programs to be accepted.

The primary motivation for this is to not regress unpriv BPF's
performance on ARM64 in a future commit where BPF_NOSPEC is also used
against Spectre v1.

This has the user-visible change that v1-induced rejections on
non-vulnerable PowerPC CPUs are avoided.

For now, this does not change the semantics of BPF_NOSPEC. It is still a
v4-only barrier and must not be implemented if bypass_spec_v4 is always
true for the arch. Changing it to a v1 AND v4-barrier is done in a
future commit.

As an alternative to bypass_spec_v1/v4, one could introduce NOSPEC_V1
AND NOSPEC_V4 instructions and allow backends to skip their lowering as
suggested by commit f5e81d1 ("bpf: Introduce BPF nospec instruction
for mitigating Spectre v4"). Adding bpf_jit_bypass_spec_v1/v4() was
found to be preferable for the following reason:

* bypass_spec_v1/v4 benefits non-vulnerable CPUs: Always performing the
  same analysis (not taking into account whether the current CPU is
  vulnerable), needlessly restricts users of CPUs that are not
  vulnerable. The only use case for this would be portability-testing,
  but this can later be added easily when needed by allowing users to
  force bypass_spec_v1/v4 to false.

* Portability is still acceptable: Directly disabling the analysis
  instead of skipping the lowering of BPF_NOSPEC(_V1/V4) might allow
  programs on non-vulnerable CPUs to be accepted while the program will
  be rejected on vulnerable CPUs. With the fallback to speculation
  barriers for Spectre v1 implemented in a future commit, this will only
  affect programs that do variable stack-accesses or are very complex.

For PowerPC, the SEC_FTR checking in bpf_jit_bypass_spec_v4() is based
on the check that was previously located in the BPF_NOSPEC case.

For LoongArch, it would likely be safe to set both
bpf_jit_bypass_spec_v1() and _v4() according to
commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation
barrier opcode"). This is omitted here as I am unable to do any testing
for LoongArch.

Signed-off-by: Luis Gerhorst <[email protected]>
Cc: Henriette Herzog <[email protected]>
Cc: Maximilian Ott <[email protected]>
Cc: Milan Stephan <[email protected]>
This changes the semantics of BPF_NOSPEC (previously a v4-only barrier)
to always emit a speculation barrier that works against both Spectre v1
AND v4. If mitigation is not needed on an architecture, the backend
should set bpf_jit_bypass_spec_v4/v1().

As of now, this commit only has the user-visible implication that unpriv
BPF's performance on PowerPC is reduced. This is the case because we
have to emit additional v1 barrier instructions for BPF_NOSPEC now.

This commit is required for a future commit to allow us to rely on
BPF_NOSPEC for Spectre v1 mitigation. As of this commit, the feature
that nospec acts as a v1 barrier is unused.

Commit f5e81d1 ("bpf: Introduce BPF nospec instruction for
mitigating Spectre v4") noted that mitigation instructions for v1 and v4
might be different on some archs. While this would potentially offer
improved performance on PowerPC, it was dismissed after the following
considerations:

* Only having one barrier simplifies the verifier and allows us to
  easily rely on v4-induced barriers for reducing the complexity of
  v1-induced speculative path verification.

* For the architectures that implemented BPF_NOSPEC, only PowerPC has
  distinct instructions for v1 and v4. Even there, some insns may be
  shared between the barriers for v1 and v4 (e.g., 'ori 31,31,0' and
  'sync'). If this is still found to impact performance in an
  unacceptable way, BPF_NOSPEC can be split into BPF_NOSPEC_V1 and
  BPF_NOSPEC_V4 later. As an optimization, we can already skip v1/v4
  insns from being emitted for PowerPC with this setup if
  bypass_spec_v1/v4 is set.

Vulnerability-status for BPF_NOSPEC-based Spectre mitigations (v4 as of
this commit, v1 in the future) is therefore:

* x86 (32-bit and 64-bit), ARM64, and PowerPC (64-bit): Mitigated - This
  patch implements BPF_NOSPEC for these architectures. The previous
  v4-only version was supported since commit f5e81d1 ("bpf:
  Introduce BPF nospec instruction for mitigating Spectre v4") and
  commit b7540d6 ("powerpc/bpf: Emit stf barrier instruction
  sequences for BPF_NOSPEC").

* LoongArch: Not Vulnerable - Commit a6f6a95 ("LoongArch, bpf: Fix
  jit to skip speculation barrier opcode") is the only other past commit
  related to BPF_NOSPEC and indicates that the insn is not required
  there.

* MIPS: Vulnerable (if unprivileged BPF is enabled) -
  Commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation
  barrier opcode") indicates that it is not vulnerable but this
  contradicts the kernel and Debian documentation. Therefore I assume
  that there exist vulnerable MIPS CPUs (but maybe not from Loongson?).
  In the future, BPF_NOSPEC could be implemented for MIPS based on the
  GCC speculation_barrier [1]. For now, we rely on unprivileged BPF
  being disabled by default.

* Other: Unknown - To the best of my knowledge there is no definitive
  information available that indicates that any other arch is
  vulnerable. They are therefore left untouched (BPF_NOSPEC is not
  implemented, but bypass_spec_v1/v4 is also not set).

I did the following testing to ensure the insn encoding is correct:

* ARM64:
  * 'dsb nsh; isb' was successfully tested with the BPF CI in [2]
  * 'sb' locally using QEMU v7.2.15 -cpu max (emitted sb insn is
    executed for example with './test_progs -t verifier_array_access')

* PowerPC: The following configs were tested locally with ppc64le QEMU
  v8.2 '-machine pseries -cpu POWER9':
  * STF_BARRIER_EIEIO + CONFIG_PPC_BOOK32_64
  * STF_BARRIER_SYNC_ORI (forced on) + CONFIG_PPC_BOOK32_64
  * STF_BARRIER_FALLBACK (forced on) + CONFIG_PPC_BOOK32_64
  * CONFIG_PPC_E500 (forced on) + STF_BARRIER_EIEIO
  * CONFIG_PPC_E500 (forced on) + STF_BARRIER_SYNC_ORI (forced on)
  * CONFIG_PPC_E500 (forced on) + STF_BARRIER_FALLBACK (forced on)
  * CONFIG_PPC_E500 (forced on) + STF_BARRIER_NONE (forced on)
  Most of those cobinations should not occur in practice, but I was not
  able to get an PPC e6500 rootfs (for testing PPC_E500 without forcing
  it on). In any case, this should ensure that there are no unexpected
  conflicts between the insns when combined like this. Individual v1/v4
  barriers were already emitted elsewhere.

[1] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=29b74545531f6afbee9fc38c267524326dbfbedf
    ("MIPS: Add speculation_barrier support")
[2] #8576

Signed-off-by: Luis Gerhorst <[email protected]>
Cc: Henriette Herzog <[email protected]>
Cc: Maximilian Ott <[email protected]>
Cc: Milan Stephan <[email protected]>
This is made to clarify that this flag will cause a nospec to be added
after this insn and can therefore be relied upon to reduce speculative
path analysis.

Signed-off-by: Luis Gerhorst <[email protected]>
Cc: Henriette Herzog <[email protected]>
Cc: Maximilian Ott <[email protected]>
Cc: Milan Stephan <[email protected]>
This implements the core of the series and causes the verifier to fall
back to mitigating Spectre v1 using speculation barriers. The approach
was presented at LPC'24 [1] and RAID'24 [2].

If we find any forbidden behavior on a speculative path, we insert a
nospec (e.g., lfence speculation barrier on x86) before the instruction
and stop verifying the path. While verifying a speculative path, we can
furthermore stop verification of that path whenever we encounter a
nospec instruction.

A minimal example program would look as follows:

	A = true
	B = true
	if A goto e
	f()
	if B goto e
	unsafe()
e:	exit

There are the following speculative and non-speculative paths
(`cur->speculative` and `speculative` referring to the value of the
push_stack() parameters):

- A = true
- B = true
- if A goto e
  - A && !cur->speculative && !speculative
    - exit
  - !A && !cur->speculative && speculative
    - f()
    - if B goto e
      - B && cur->speculative && !speculative
        - exit
      - !B && cur->speculative && speculative
        - unsafe()

If f() contains any unsafe behavior under Spectre v1 and the unsafe
behavior matches `state->speculative &&
error_recoverable_with_nospec(err)`, do_check() will now add a nospec
before f() instead of rejecting the program:

	A = true
	B = true
	if A goto e
	nospec
	f()
	if B goto e
	unsafe()
e:	exit

Alternatively, the algorithm also takes advantage of nospec instructions
inserted for other reasons (e.g., Spectre v4). Taking the program above
as an example, speculative path exploration can stop before f() if a
nospec was inserted there because of Spectre v4 sanitization.

In this example, all instructions after the nospec are dead code (and
with the nospec they are also dead code speculatively).

On x86_64, this depends on the following property of lfence [3]:

	An LFENCE instruction or a serializing instruction will ensure that no
	later instructions execute, even speculatively, until all prior
	instructions complete locally. [...] Inserting an LFENCE instruction
	after a bounds check prevents later operations from executing before
	the bound check completes.

Regarding the example, this implies that `if B goto e` will not execute
before `if A goto e` completes. Once `if A goto e` completes, the CPU
should find that the speculation was wrong and continue with `exit`.

If there is any other path that leads to `if B goto e` (and therefore
`unsafe()`) without going through `if A goto e`, then a nospec will
still be needed there. However, this patch assumes this other path will
be explored separately and therefore be discovered by the verifier even
if the exploration discussed here stops at the nospec.

This patch furthermore has the unfortunate consequence that Spectre v1
mitigations now only support architectures which implement BPF_NOSPEC.
Before this commit, Spectre v1 mitigations prevented exploits by
rejecting the programs on all architectures. Because some JITs do not
implement BPF_NOSPEC, this patch therefore may regress unpriv BPF's
security to a limited extent:

* The regression is limited to systems vulnerable to Spectre v1, have
  unprivileged BPF enabled, and do NOT emit insns for BPF_NOSPEC. The
  latter is not the case for x86 64- and 32-bit, arm64, and powerpc
  64-bit and they are therefore not affected by the regression.
  According to commit a6f6a95 ("LoongArch, bpf: Fix jit to skip
  speculation barrier opcode"), LoongArch is not vulnerable to Spectre
  v1 and therefore also not affected by the regression.

* To the best of my knowledge this regression may therefore only affect
  MIPS. This is deemed acceptable because unpriv BPF is still disabled
  there by default. As stated in a previous commit, BPF_NOSPEC could be
  implemented for MIPS based on GCC's speculation_barrier
  implementation.

* It is unclear which other architectures (besides x86 64- and 32-bit,
  ARM64, PowerPC 64-bit, LoongArch, and MIPS) supported by the kernel
  are vulnerable to Spectre v1. Also, it is not clear if barriers are
  available on these architectures. Implementing BPF_NOSPEC on these
  architectures therefore is non-trivial. Searching GCC and the kernel
  for speculation barrier implementations for these architectures
  yielded no result.

* If any of those regressed systems is also vulnerable to Spectre v4,
  the system was already vulnerable to Spectre v4 attacks based on
  unpriv BPF before this patch and the impact is therefore further
  limited.

As an alternative to regressing security, one could still reject
programs if the architecture does not emit BPF_NOSPEC (e.g., by removing
the empty BPF_NOSPEC-case from all JITs except for LoongArch where it
appears justified). However, this will cause rejections on these archs
that are likely unfounded in the vast majority of cases.

In the tests, some are now successful where we previously had a
false-positive (i.e., rejection). Change them to reflect where the
nospec should be inserted (using __xlated_unpriv) and modify the error
message if the nospec is able to mitigate a problem that previously
shadowed another problem (in that case __xlated_unpriv does not work,
therefore just add a comment).

Define SPEC_V1 to avoid duplicating this ifdef whenever we check for
nospec insns using __xlated_unpriv, define it here once. This also
improves readability. PowerPC can probably also be added here. However,
omit it for now because the BPF CI currently does not include a test.

Briefly went through all the occurrences of EPERM, EINVAL, and EACCESS
in the verifier in order to validate that catching them like this makes
sense.

[1] https://lpc.events/event/18/contributions/1954/ ("Mitigating
    Spectre-PHT using Speculation Barriers in Linux eBPF")
[2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and
    Precise Spectre Defenses for Untrusted Linux Kernel Extensions")
[3] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/runtime-speculative-side-channel-mitigations.html
    ("Managed Runtime Speculative Execution Side Channel Mitigations")

Signed-off-by: Luis Gerhorst <[email protected]>
Acked-by: Henriette Herzog <[email protected]>
Cc: Maximilian Ott <[email protected]>
Cc: Milan Stephan <[email protected]>
This is based on the gadget from the description of commit 9183671af6db
("bpf: Fix leakage under speculation on mispredicted branches").

Signed-off-by: Luis Gerhorst <[email protected]>
Insert a nospec before the access to prevent it from ever using an index
that is subject to speculative scalar-confusion.

The access itself can either happen directly in the BPF program (reads
only, check_stack_read_var_off()) or in a helper (read/write,
check_helper_mem_access()).

This relies on the fact that the speculative scalar confusion that leads
to the variable-stack access going OOBs must stem from a prior
speculative store or branch bypass. Adding a nospec before the
variable-stack access will force all previously bypassed stores/branches
to complete and cause the stack access to only ever go to the stack slot
that is accessed architecturally.

Alternatively, the variable-offset stack access might be a write that
can itself be subject to speculative store bypass (this can happen in
theory even if this code adds a nospec /before/ the variable-offset
write). Only indirect writes by helpers might be affected here (e.g.,
those taking ARG_PTR_TO_MAP_VALUE). (Because check_stack_write_var_off()
does not use check_stack_range_initialized(), in-program variable-offset
writes are not affected.) If the in-helper write can be subject to
Spectre v4 and the helper writes/overwrites pointers on the BPF stack,
they are already a problem for fixed-offset stack accesses and should be
subject to Spectre v4 sanitization.

Signed-off-by: Luis Gerhorst <[email protected]>
Acked-by: Henriette Herzog <[email protected]>
Cc: Maximilian Ott <[email protected]>
Cc: Milan Stephan <[email protected]>
ALU sanitization was introduced to ensure that a subsequent ptr access
can never go OOB, even under speculation. This is required because we
currently allow speculative scalar confusion. Spec. scalar confusion is
possible because Spectre v4 sanitization only adds a nospec after
critical stores (e.g., scalar overwritten with a pointer).

If we add a nospec before the ALU op, none of the operands can be
subject to scalar confusion. As an ADD/SUB can not introduce scalar
confusion itself, the result will also not be subject to scalar
confusion. Therefore, the subsequent ptr access is always safe.

We directly fall back to nospec for the sanitization errors
REASON_BOUNDS, _TYPE, _PATHS, and _LIMIT, even if we are not on a
speculative path.

For REASON_STACK, we return the error -ENOMEM directly now. Previously,
sanitize_err() returned -EACCES for this case but we change it to
-ENOMEM because doing so prevents do_check() from falling back to a
nospec if we are on a speculative path. This would not be a serious
issue (the verifier would probably run into the -ENOMEM again shortly on
the next non-speculative path and still abort verification), but -ENOMEM
is more fitting here anyway. An alternative would be -EFAULT, which is
also returned for some of the other cases where push_stack() fails, but
this is more frequently used for verifier-internal bugs.

Signed-off-by: Luis Gerhorst <[email protected]>
Acked-by: Henriette Herzog <[email protected]>
Cc: Maximilian Ott <[email protected]>
Cc: Milan Stephan <[email protected]>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 224ee86
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=955213
version: 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant