Commit 1250dc6
sched: disable preemption around blk_flush_plug in sched_submit_work
On preemptible kernels, a three-way deadlock can occur involving
blk_mq_freeze_queue and blk_mq_dispatch_list:
- Task A holds a filesystem lock (e.g., f2fs io_rwsem) and enters
__bio_queue_enter(), waiting for mq_freeze_depth == 0
- Task B holds mq_freeze_depth=1 (elevator_change) and waits for
q_usage_counter to reach zero in blk_mq_freeze_queue_wait()
- Task C is going to sleep waiting for the filesystem lock. Before
sleeping, schedule() calls sched_submit_work() -> blk_flush_plug()
-> blk_mq_dispatch_list(), which acquires q_usage_counter via
percpu_ref_get(). If Task C gets preempted before percpu_ref_put(),
it will not be scheduled back because the task is already in
uninterruptible sleep state (TASK_UNINTERRUPTIBLE). This means it
holds the percpu_ref indefinitely, preventing freeze from completing.
This is fundamentally an ABBA deadlock between queue freeze and the
filesystem lock, exposed by preemption creating an artificial hold
on q_usage_counter during the plug flush.
Fix by disabling preemption around blk_flush_plug() in
sched_submit_work(). The _notrace variants are used since this runs
in scheduler context. preempt_enable_no_resched_notrace() is correct
because we are already inside __schedule() and about to pick the next
task.
Fixes: 73c1010 ("block: initial patch for on-stack per-task plugging")
Reported-by: Michael Wu <[email protected]>
Tested-by: Michael Wu <[email protected]>
Link: https://lore.kernel.org/linux-block/[email protected]/
Signed-off-by: Ming Lei <[email protected]>1 parent 59ca59b commit 1250dc6
1 file changed
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7243 | 7243 | | |
7244 | 7244 | | |
7245 | 7245 | | |
| 7246 | + | |
7246 | 7247 | | |
| 7248 | + | |
7247 | 7249 | | |
7248 | 7250 | | |
7249 | 7251 | | |
| |||
0 commit comments