Skip to content

Commit d5437c1

Browse files
committed
rxrpc: Fix locking issues in rxrpc_put_peer_locked()
jira LE-1907 Rebuild_History Non-Buildable kernel-5.14.0-284.30.1.el9_2 commit-author David Howells <[email protected]> commit 608aecd Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-5.14.0-284.30.1.el9_2/608aecd1.failed Now that rxrpc_put_local() may call kthread_stop(), it can't be called under spinlock as it might sleep. This can cause a problem in the peer keepalive code in rxrpc as it tries to avoid dropping the peer_hash_lock from the point it needs to re-add peer->keepalive_link to going round the loop again in rxrpc_peer_keepalive_dispatch(). Fix this by just dropping the lock when we don't need it and accepting that we'll have to take it again. This code is only called about every 20s for each peer, so not very often. This allows rxrpc_put_peer_unlocked() to be removed also. If triggered, this bug produces an oops like the following, as reproduced by a syzbot reproducer for a different oops[1]: BUG: sleeping function called from invalid context at kernel/sched/completion.c:101 ... RCU nest depth: 0, expected: 0 3 locks held by kworker/u9:0/50: #0: ffff88810e74a138 ((wq_completion)krxrpcd){+.+.}-{0:0}, at: process_one_work+0x294/0x636 #1: ffff8881013a7e20 ((work_completion)(&rxnet->peer_keepalive_work)){+.+.}-{0:0}, at: process_one_work+0x294/0x636 #2: ffff88817d366390 (&rxnet->peer_hash_lock){+.+.}-{2:2}, at: rxrpc_peer_keepalive_dispatch+0x2bd/0x35f ... Call Trace: <TASK> dump_stack_lvl+0x4c/0x5f __might_resched+0x2cf/0x2f2 __wait_for_common+0x87/0x1e8 kthread_stop+0x14d/0x255 rxrpc_peer_keepalive_dispatch+0x333/0x35f rxrpc_peer_keepalive_worker+0x2e9/0x449 process_one_work+0x3c1/0x636 worker_thread+0x25f/0x359 kthread+0x1a6/0x1b5 ret_from_fork+0x1f/0x30 Fixes: a275da6 ("rxrpc: Create a per-local endpoint receive queue and I/O thread") Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ [1] Signed-off-by: David S. Miller <[email protected]> (cherry picked from commit 608aecd) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # net/rxrpc/ar-internal.h # net/rxrpc/peer_event.c # net/rxrpc/peer_object.c
1 parent 62c52f8 commit d5437c1

File tree

1 file changed

+178
-0
lines changed

1 file changed

+178
-0
lines changed
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
rxrpc: Fix locking issues in rxrpc_put_peer_locked()
2+
3+
jira LE-1907
4+
Rebuild_History Non-Buildable kernel-5.14.0-284.30.1.el9_2
5+
commit-author David Howells <[email protected]>
6+
commit 608aecd16a31269485e2980898029dd01b03a73e
7+
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
8+
Will be included in final tarball splat. Ref for failed cherry-pick at:
9+
ciq/ciq_backports/kernel-5.14.0-284.30.1.el9_2/608aecd1.failed
10+
11+
Now that rxrpc_put_local() may call kthread_stop(), it can't be called
12+
under spinlock as it might sleep. This can cause a problem in the peer
13+
keepalive code in rxrpc as it tries to avoid dropping the peer_hash_lock
14+
from the point it needs to re-add peer->keepalive_link to going round the
15+
loop again in rxrpc_peer_keepalive_dispatch().
16+
17+
Fix this by just dropping the lock when we don't need it and accepting that
18+
we'll have to take it again. This code is only called about every 20s for
19+
each peer, so not very often.
20+
21+
This allows rxrpc_put_peer_unlocked() to be removed also.
22+
23+
If triggered, this bug produces an oops like the following, as reproduced
24+
by a syzbot reproducer for a different oops[1]:
25+
26+
BUG: sleeping function called from invalid context at kernel/sched/completion.c:101
27+
...
28+
RCU nest depth: 0, expected: 0
29+
3 locks held by kworker/u9:0/50:
30+
#0: ffff88810e74a138 ((wq_completion)krxrpcd){+.+.}-{0:0}, at: process_one_work+0x294/0x636
31+
#1: ffff8881013a7e20 ((work_completion)(&rxnet->peer_keepalive_work)){+.+.}-{0:0}, at: process_one_work+0x294/0x636
32+
#2: ffff88817d366390 (&rxnet->peer_hash_lock){+.+.}-{2:2}, at: rxrpc_peer_keepalive_dispatch+0x2bd/0x35f
33+
...
34+
Call Trace:
35+
<TASK>
36+
dump_stack_lvl+0x4c/0x5f
37+
__might_resched+0x2cf/0x2f2
38+
__wait_for_common+0x87/0x1e8
39+
kthread_stop+0x14d/0x255
40+
rxrpc_peer_keepalive_dispatch+0x333/0x35f
41+
rxrpc_peer_keepalive_worker+0x2e9/0x449
42+
process_one_work+0x3c1/0x636
43+
worker_thread+0x25f/0x359
44+
kthread+0x1a6/0x1b5
45+
ret_from_fork+0x1f/0x30
46+
47+
Fixes: a275da62e8c1 ("rxrpc: Create a per-local endpoint receive queue and I/O thread")
48+
Signed-off-by: David Howells <[email protected]>
49+
cc: Marc Dionne <[email protected]>
50+
51+
Link: https://lore.kernel.org/r/[email protected]/ [1]
52+
Signed-off-by: David S. Miller <[email protected]>
53+
(cherry picked from commit 608aecd16a31269485e2980898029dd01b03a73e)
54+
Signed-off-by: Jonathan Maple <[email protected]>
55+
56+
# Conflicts:
57+
# net/rxrpc/ar-internal.h
58+
# net/rxrpc/peer_event.c
59+
# net/rxrpc/peer_object.c
60+
diff --cc net/rxrpc/ar-internal.h
61+
index 46ce41afb431,5b732a4af009..000000000000
62+
--- a/net/rxrpc/ar-internal.h
63+
+++ b/net/rxrpc/ar-internal.h
64+
@@@ -1032,10 -1070,9 +1032,16 @@@ struct rxrpc_peer *rxrpc_alloc_peer(str
65+
void rxrpc_new_incoming_peer(struct rxrpc_sock *, struct rxrpc_local *,
66+
struct rxrpc_peer *);
67+
void rxrpc_destroy_all_peers(struct rxrpc_net *);
68+
++<<<<<<< HEAD
69+
+struct rxrpc_peer *rxrpc_get_peer(struct rxrpc_peer *);
70+
+struct rxrpc_peer *rxrpc_get_peer_maybe(struct rxrpc_peer *);
71+
+void rxrpc_put_peer(struct rxrpc_peer *);
72+
+void rxrpc_put_peer_locked(struct rxrpc_peer *);
73+
++=======
74+
+ struct rxrpc_peer *rxrpc_get_peer(struct rxrpc_peer *, enum rxrpc_peer_trace);
75+
+ struct rxrpc_peer *rxrpc_get_peer_maybe(struct rxrpc_peer *, enum rxrpc_peer_trace);
76+
+ void rxrpc_put_peer(struct rxrpc_peer *, enum rxrpc_peer_trace);
77+
++>>>>>>> 608aecd16a31 (rxrpc: Fix locking issues in rxrpc_put_peer_locked())
78+
79+
/*
80+
* proc.c
81+
diff --cc net/rxrpc/peer_event.c
82+
index b5160a878fe1,552ba84a255c..000000000000
83+
--- a/net/rxrpc/peer_event.c
84+
+++ b/net/rxrpc/peer_event.c
85+
@@@ -303,21 -235,23 +303,28 @@@ static void rxrpc_peer_keepalive_dispat
86+
struct rxrpc_peer *peer;
87+
const u8 mask = ARRAY_SIZE(rxnet->peer_keepalive) - 1;
88+
time64_t keepalive_at;
89+
+ bool use;
90+
int slot;
91+
92+
- spin_lock(&rxnet->peer_hash_lock);
93+
+ spin_lock_bh(&rxnet->peer_hash_lock);
94+
95+
while (!list_empty(collector)) {
96+
peer = list_entry(collector->next,
97+
struct rxrpc_peer, keepalive_link);
98+
99+
list_del_init(&peer->keepalive_link);
100+
- if (!rxrpc_get_peer_maybe(peer, rxrpc_peer_get_keepalive))
101+
+ if (!rxrpc_get_peer_maybe(peer))
102+
continue;
103+
104+
++<<<<<<< HEAD
105+
+ if (__rxrpc_use_local(peer->local)) {
106+
+ spin_unlock_bh(&rxnet->peer_hash_lock);
107+
++=======
108+
+ use = __rxrpc_use_local(peer->local, rxrpc_local_use_peer_keepalive);
109+
+ spin_unlock(&rxnet->peer_hash_lock);
110+
++>>>>>>> 608aecd16a31 (rxrpc: Fix locking issues in rxrpc_put_peer_locked())
111+
112+
+ if (use) {
113+
keepalive_at = peer->last_tx_at + RXRPC_KEEPALIVE_TIME;
114+
slot = keepalive_at - base;
115+
_debug("%02x peer %u t=%d {%pISp}",
116+
@@@ -335,15 -269,17 +342,23 @@@
117+
*/
118+
slot += cursor;
119+
slot &= mask;
120+
- spin_lock(&rxnet->peer_hash_lock);
121+
+ spin_lock_bh(&rxnet->peer_hash_lock);
122+
list_add_tail(&peer->keepalive_link,
123+
&rxnet->peer_keepalive[slot & mask]);
124+
++<<<<<<< HEAD
125+
+ rxrpc_unuse_local(peer->local);
126+
+ }
127+
+ rxrpc_put_peer_locked(peer);
128+
++=======
129+
+ spin_unlock(&rxnet->peer_hash_lock);
130+
+ rxrpc_unuse_local(peer->local, rxrpc_local_unuse_peer_keepalive);
131+
+ }
132+
+ rxrpc_put_peer(peer, rxrpc_peer_put_keepalive);
133+
+ spin_lock(&rxnet->peer_hash_lock);
134+
++>>>>>>> 608aecd16a31 (rxrpc: Fix locking issues in rxrpc_put_peer_locked())
135+
}
136+
137+
- spin_unlock(&rxnet->peer_hash_lock);
138+
+ spin_unlock_bh(&rxnet->peer_hash_lock);
139+
}
140+
141+
/*
142+
diff --cc net/rxrpc/peer_object.c
143+
index 041a51225c5f,82de295393a0..000000000000
144+
--- a/net/rxrpc/peer_object.c
145+
+++ b/net/rxrpc/peer_object.c
146+
@@@ -444,26 -439,6 +444,29 @@@ void rxrpc_put_peer(struct rxrpc_peer *
147+
}
148+
149+
/*
150+
++<<<<<<< HEAD
151+
+ * Drop a ref on a peer record where the caller already holds the
152+
+ * peer_hash_lock.
153+
+ */
154+
+void rxrpc_put_peer_locked(struct rxrpc_peer *peer)
155+
+{
156+
+ const void *here = __builtin_return_address(0);
157+
+ unsigned int debug_id = peer->debug_id;
158+
+ bool dead;
159+
+ int r;
160+
+
161+
+ dead = __refcount_dec_and_test(&peer->ref, &r);
162+
+ trace_rxrpc_peer(debug_id, rxrpc_peer_put, r - 1, here);
163+
+ if (dead) {
164+
+ hash_del_rcu(&peer->hash_link);
165+
+ list_del_init(&peer->keepalive_link);
166+
+ rxrpc_free_peer(peer);
167+
+ }
168+
+}
169+
+
170+
+/*
171+
++=======
172+
++>>>>>>> 608aecd16a31 (rxrpc: Fix locking issues in rxrpc_put_peer_locked())
173+
* Make sure all peer records have been discarded.
174+
*/
175+
void rxrpc_destroy_all_peers(struct rxrpc_net *rxnet)
176+
* Unmerged path net/rxrpc/ar-internal.h
177+
* Unmerged path net/rxrpc/peer_event.c
178+
* Unmerged path net/rxrpc/peer_object.c

0 commit comments

Comments
 (0)