Skip to content

Commit c81f67f

Browse files
committed
bpf: Fix overrunning reservations in ringbuf
jira LE-3201 cve CVE-2024-41009 Rebuild_History Non-Buildable kernel-rt-4.18.0-553.27.1.rt7.368.el8_10 commit-author Daniel Borkmann <[email protected]> commit cfa1a23 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-rt-4.18.0-553.27.1.rt7.368.el8_10/cfa1a232.failed The BPF ring buffer internally is implemented as a power-of-2 sized circular buffer, with two logical and ever-increasing counters: consumer_pos is the consumer counter to show which logical position the consumer consumed the data, and producer_pos which is the producer counter denoting the amount of data reserved by all producers. Each time a record is reserved, the producer that "owns" the record will successfully advance producer counter. In user space each time a record is read, the consumer of the data advanced the consumer counter once it finished processing. Both counters are stored in separate pages so that from user space, the producer counter is read-only and the consumer counter is read-write. One aspect that simplifies and thus speeds up the implementation of both producers and consumers is how the data area is mapped twice contiguously back-to-back in the virtual memory, allowing to not take any special measures for samples that have to wrap around at the end of the circular buffer data area, because the next page after the last data page would be first data page again, and thus the sample will still appear completely contiguous in virtual memory. Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for book-keeping the length and offset, and is inaccessible to the BPF program. Helpers like bpf_ringbuf_reserve() return `(void *)hdr + BPF_RINGBUF_HDR_SZ` for the BPF program to use. Bing-Jhong and Muhammad reported that it is however possible to make a second allocated memory chunk overlapping with the first chunk and as a result, the BPF program is now able to edit first chunk's header. For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in [0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets allocate a chunk B with size 0x3000. This will succeed because consumer_pos was edited ahead of time to pass the `new_prod_pos - cons_pos > rb->mask` check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data pages. This means that chunk B at [0x4000,0x4008] is chunk A's header. bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header's pg_off to then locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk B modified chunk A's header, then bpf_ringbuf_commit() refers to the wrong page and could cause a crash. Fix it by calculating the oldest pending_pos and check whether the range from the oldest outstanding record to the newest would span beyond the ring buffer size. If that is the case, then reject the request. We've tested with the ring buffer benchmark in BPF selftests (./benchs/run_bench_ringbufs.sh) before/after the fix and while it seems a bit slower on some benchmarks, it is still not significantly enough to matter. Fixes: 457f443 ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: Bing-Jhong Billy Jheng <[email protected]> Reported-by: Muhammad Ramdhan <[email protected]> Co-developed-by: Bing-Jhong Billy Jheng <[email protected]> Co-developed-by: Andrii Nakryiko <[email protected]> Signed-off-by: Bing-Jhong Billy Jheng <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] (cherry picked from commit cfa1a23) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # kernel/bpf/ringbuf.c
1 parent a09fb34 commit c81f67f

File tree

1 file changed

+133
-0
lines changed

1 file changed

+133
-0
lines changed
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
bpf: Fix overrunning reservations in ringbuf
2+
3+
jira LE-3201
4+
cve CVE-2024-41009
5+
Rebuild_History Non-Buildable kernel-rt-4.18.0-553.27.1.rt7.368.el8_10
6+
commit-author Daniel Borkmann <[email protected]>
7+
commit cfa1a2329a691ffd991fcf7248a57d752e712881
8+
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
9+
Will be included in final tarball splat. Ref for failed cherry-pick at:
10+
ciq/ciq_backports/kernel-rt-4.18.0-553.27.1.rt7.368.el8_10/cfa1a232.failed
11+
12+
The BPF ring buffer internally is implemented as a power-of-2 sized circular
13+
buffer, with two logical and ever-increasing counters: consumer_pos is the
14+
consumer counter to show which logical position the consumer consumed the
15+
data, and producer_pos which is the producer counter denoting the amount of
16+
data reserved by all producers.
17+
18+
Each time a record is reserved, the producer that "owns" the record will
19+
successfully advance producer counter. In user space each time a record is
20+
read, the consumer of the data advanced the consumer counter once it finished
21+
processing. Both counters are stored in separate pages so that from user
22+
space, the producer counter is read-only and the consumer counter is read-write.
23+
24+
One aspect that simplifies and thus speeds up the implementation of both
25+
producers and consumers is how the data area is mapped twice contiguously
26+
back-to-back in the virtual memory, allowing to not take any special measures
27+
for samples that have to wrap around at the end of the circular buffer data
28+
area, because the next page after the last data page would be first data page
29+
again, and thus the sample will still appear completely contiguous in virtual
30+
memory.
31+
32+
Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for
33+
book-keeping the length and offset, and is inaccessible to the BPF program.
34+
Helpers like bpf_ringbuf_reserve() return `(void *)hdr + BPF_RINGBUF_HDR_SZ`
35+
for the BPF program to use. Bing-Jhong and Muhammad reported that it is however
36+
possible to make a second allocated memory chunk overlapping with the first
37+
chunk and as a result, the BPF program is now able to edit first chunk's
38+
header.
39+
40+
For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size
41+
of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to
42+
bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in
43+
[0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets
44+
allocate a chunk B with size 0x3000. This will succeed because consumer_pos
45+
was edited ahead of time to pass the `new_prod_pos - cons_pos > rb->mask`
46+
check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able
47+
to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned
48+
earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data
49+
pages. This means that chunk B at [0x4000,0x4008] is chunk A's header.
50+
bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header's pg_off to then
51+
locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk
52+
B modified chunk A's header, then bpf_ringbuf_commit() refers to the wrong
53+
page and could cause a crash.
54+
55+
Fix it by calculating the oldest pending_pos and check whether the range
56+
from the oldest outstanding record to the newest would span beyond the ring
57+
buffer size. If that is the case, then reject the request. We've tested with
58+
the ring buffer benchmark in BPF selftests (./benchs/run_bench_ringbufs.sh)
59+
before/after the fix and while it seems a bit slower on some benchmarks, it
60+
is still not significantly enough to matter.
61+
62+
Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
63+
Reported-by: Bing-Jhong Billy Jheng <[email protected]>
64+
Reported-by: Muhammad Ramdhan <[email protected]>
65+
Co-developed-by: Bing-Jhong Billy Jheng <[email protected]>
66+
Co-developed-by: Andrii Nakryiko <[email protected]>
67+
Signed-off-by: Bing-Jhong Billy Jheng <[email protected]>
68+
Signed-off-by: Andrii Nakryiko <[email protected]>
69+
Signed-off-by: Daniel Borkmann <[email protected]>
70+
Signed-off-by: Andrii Nakryiko <[email protected]>
71+
Link: https://lore.kernel.org/bpf/[email protected]
72+
(cherry picked from commit cfa1a2329a691ffd991fcf7248a57d752e712881)
73+
Signed-off-by: Jonathan Maple <[email protected]>
74+
75+
# Conflicts:
76+
# kernel/bpf/ringbuf.c
77+
diff --cc kernel/bpf/ringbuf.c
78+
index 638d7fd7b375,e20b90c36131..000000000000
79+
--- a/kernel/bpf/ringbuf.c
80+
+++ b/kernel/bpf/ringbuf.c
81+
@@@ -37,10 -30,44 +37,51 @@@ struct bpf_ringbuf
82+
struct page **pages;
83+
int nr_pages;
84+
spinlock_t spinlock ____cacheline_aligned_in_smp;
85+
++<<<<<<< HEAD
86+
+ /* Consumer and producer counters are put into separate pages to allow
87+
+ * mapping consumer page as r/w, but restrict producer page to r/o.
88+
+ * This protects producer position from being modified by user-space
89+
+ * application and ruining in-kernel position tracking.
90+
++=======
91+
+ /* For user-space producer ring buffers, an atomic_t busy bit is used
92+
+ * to synchronize access to the ring buffers in the kernel, rather than
93+
+ * the spinlock that is used for kernel-producer ring buffers. This is
94+
+ * done because the ring buffer must hold a lock across a BPF program's
95+
+ * callback:
96+
+ *
97+
+ * __bpf_user_ringbuf_peek() // lock acquired
98+
+ * -> program callback_fn()
99+
+ * -> __bpf_user_ringbuf_sample_release() // lock released
100+
+ *
101+
+ * It is unsafe and incorrect to hold an IRQ spinlock across what could
102+
+ * be a long execution window, so we instead simply disallow concurrent
103+
+ * access to the ring buffer by kernel consumers, and return -EBUSY from
104+
+ * __bpf_user_ringbuf_peek() if the busy bit is held by another task.
105+
+ */
106+
+ atomic_t busy ____cacheline_aligned_in_smp;
107+
+ /* Consumer and producer counters are put into separate pages to
108+
+ * allow each position to be mapped with different permissions.
109+
+ * This prevents a user-space application from modifying the
110+
+ * position and ruining in-kernel tracking. The permissions of the
111+
+ * pages depend on who is producing samples: user-space or the
112+
+ * kernel. Note that the pending counter is placed in the same
113+
+ * page as the producer, so that it shares the same cache line.
114+
+ *
115+
+ * Kernel-producer
116+
+ * ---------------
117+
+ * The producer position and data pages are mapped as r/o in
118+
+ * userspace. For this approach, bits in the header of samples are
119+
+ * used to signal to user-space, and to other producers, whether a
120+
+ * sample is currently being written.
121+
+ *
122+
+ * User-space producer
123+
+ * -------------------
124+
+ * Only the page containing the consumer position is mapped r/o in
125+
+ * user-space. User-space producers also use bits of the header to
126+
+ * communicate to the kernel, but the kernel must carefully check and
127+
+ * validate each sample to ensure that they're correctly formatted, and
128+
+ * fully contained within the ring buffer.
129+
++>>>>>>> cfa1a2329a69 (bpf: Fix overrunning reservations in ringbuf)
130+
*/
131+
unsigned long consumer_pos __aligned(PAGE_SIZE);
132+
unsigned long producer_pos __aligned(PAGE_SIZE);
133+
* Unmerged path kernel/bpf/ringbuf.c

0 commit comments

Comments
 (0)