Skip to content

Commit 7d63c6f

Browse files
htejungregkh
authored andcommitted
blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init
[ Upstream commit ec14a87 ] blk-iocost sometimes causes the following crash: BUG: kernel NULL pointer dereference, address: 00000000000000e0 ... RIP: 0010:_raw_spin_lock+0x17/0x30 Code: be 01 02 00 00 e8 79 38 39 ff 31 d2 89 d0 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 65 ff 05 48 d0 34 7e b9 01 00 00 00 31 c0 <f0> 0f b1 0f 75 02 5d c3 89 c6 e8 ea 04 00 00 5d c3 0f 1f 84 00 00 RSP: 0018:ffffc900023b3d40 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 00000000000000e0 RCX: 0000000000000001 RDX: ffffc900023b3d20 RSI: ffffc900023b3cf0 RDI: 00000000000000e0 RBP: ffffc900023b3d40 R08: ffffc900023b3c10 R09: 0000000000000003 R10: 0000000000000064 R11: 000000000000000a R12: ffff888102337000 R13: fffffffffffffff2 R14: ffff88810af408c8 R15: ffff8881070c3600 FS: 00007faaaf364fc0(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000e0 CR3: 00000001097b1000 CR4: 0000000000350ea0 Call Trace: <TASK> ioc_weight_write+0x13d/0x410 cgroup_file_write+0x7a/0x130 kernfs_fop_write_iter+0xf5/0x170 vfs_write+0x298/0x370 ksys_write+0x5f/0xb0 __x64_sys_write+0x1b/0x20 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This happens because iocg->ioc is NULL. The field is initialized by ioc_pd_init() and never cleared. The NULL deref is caused by blkcg_activate_policy() installing blkg_policy_data before initializing it. blkcg_activate_policy() was doing the following: 1. Allocate pd's for all existing blkg's and install them in blkg->pd[]. 2. Initialize all pd's. 3. Online all pd's. blkcg_activate_policy() only grabs the queue_lock and may release and re-acquire the lock as allocation may need to sleep. ioc_weight_write() grabs blkcg->lock and iterates all its blkg's. The two can race and if ioc_weight_write() runs during #1 or between #1 and #2, it can encounter a pd which is not initialized yet, leading to crash. The crash can be reproduced with the following script: #!/bin/bash echo +io > /sys/fs/cgroup/cgroup.subtree_control systemd-run --unit touch-sda --scope dd if=/dev/sda of=/dev/null bs=1M count=1 iflag=direct echo 100 > /sys/fs/cgroup/system.slice/io.weight bash -c "echo '8:0 enable=1' > /sys/fs/cgroup/io.cost.qos" & sleep .2 echo 100 > /sys/fs/cgroup/system.slice/io.weight with the following patch applied: > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > index fc49be6..38d671d5e10c 100644 > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -1553,6 +1553,12 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol) > pd->online = false; > } > > + if (system_state == SYSTEM_RUNNING) { > + spin_unlock_irq(&q->queue_lock); > + ssleep(1); > + spin_lock_irq(&q->queue_lock); > + } > + > /* all allocated, init in the same order */ > if (pol->pd_init_fn) > list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) I don't see a reason why all pd's should be allocated, initialized and onlined together. The only ordering requirement is that parent blkgs to be initialized and onlined before children, which is guaranteed from the walking order. Let's fix the bug by allocating, initializing and onlining pd for each blkg and holding blkcg->lock over initialization and onlining. This ensures that an installed blkg is always fully initialized and onlined removing the the race window. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Breno Leitao <[email protected]> Fixes: 9d179b8 ("blkcg: Fix multiple bugs in blkcg_activate_policy()") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
1 parent 02c126a commit 7d63c6f

File tree

1 file changed

+18
-14
lines changed

1 file changed

+18
-14
lines changed

block/blk-cgroup.c

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1511,7 +1511,7 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
15111511
retry:
15121512
spin_lock_irq(&q->queue_lock);
15131513

1514-
/* blkg_list is pushed at the head, reverse walk to allocate parents first */
1514+
/* blkg_list is pushed at the head, reverse walk to initialize parents first */
15151515
list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
15161516
struct blkg_policy_data *pd;
15171517

@@ -1549,21 +1549,20 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
15491549
goto enomem;
15501550
}
15511551

1552-
blkg->pd[pol->plid] = pd;
1552+
spin_lock(&blkg->blkcg->lock);
1553+
15531554
pd->blkg = blkg;
15541555
pd->plid = pol->plid;
1555-
pd->online = false;
1556-
}
1556+
blkg->pd[pol->plid] = pd;
15571557

1558-
/* all allocated, init in the same order */
1559-
if (pol->pd_init_fn)
1560-
list_for_each_entry_reverse(blkg, &q->blkg_list, q_node)
1561-
pol->pd_init_fn(blkg->pd[pol->plid]);
1558+
if (pol->pd_init_fn)
1559+
pol->pd_init_fn(pd);
15621560

1563-
list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
15641561
if (pol->pd_online_fn)
1565-
pol->pd_online_fn(blkg->pd[pol->plid]);
1566-
blkg->pd[pol->plid]->online = true;
1562+
pol->pd_online_fn(pd);
1563+
pd->online = true;
1564+
1565+
spin_unlock(&blkg->blkcg->lock);
15671566
}
15681567

15691568
__set_bit(pol->plid, q->blkcg_pols);
@@ -1580,14 +1579,19 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
15801579
return ret;
15811580

15821581
enomem:
1583-
/* alloc failed, nothing's initialized yet, free everything */
1582+
/* alloc failed, take down everything */
15841583
spin_lock_irq(&q->queue_lock);
15851584
list_for_each_entry(blkg, &q->blkg_list, q_node) {
15861585
struct blkcg *blkcg = blkg->blkcg;
1586+
struct blkg_policy_data *pd;
15871587

15881588
spin_lock(&blkcg->lock);
1589-
if (blkg->pd[pol->plid]) {
1590-
pol->pd_free_fn(blkg->pd[pol->plid]);
1589+
pd = blkg->pd[pol->plid];
1590+
if (pd) {
1591+
if (pd->online && pol->pd_offline_fn)
1592+
pol->pd_offline_fn(pd);
1593+
pd->online = false;
1594+
pol->pd_free_fn(pd);
15911595
blkg->pd[pol->plid] = NULL;
15921596
}
15931597
spin_unlock(&blkcg->lock);

0 commit comments

Comments
 (0)