Skip to content

Commit 1250dc6

Browse files
ming1kawasaki
authored andcommitted
sched: disable preemption around blk_flush_plug in sched_submit_work
On preemptible kernels, a three-way deadlock can occur involving blk_mq_freeze_queue and blk_mq_dispatch_list: - Task A holds a filesystem lock (e.g., f2fs io_rwsem) and enters __bio_queue_enter(), waiting for mq_freeze_depth == 0 - Task B holds mq_freeze_depth=1 (elevator_change) and waits for q_usage_counter to reach zero in blk_mq_freeze_queue_wait() - Task C is going to sleep waiting for the filesystem lock. Before sleeping, schedule() calls sched_submit_work() -> blk_flush_plug() -> blk_mq_dispatch_list(), which acquires q_usage_counter via percpu_ref_get(). If Task C gets preempted before percpu_ref_put(), it will not be scheduled back because the task is already in uninterruptible sleep state (TASK_UNINTERRUPTIBLE). This means it holds the percpu_ref indefinitely, preventing freeze from completing. This is fundamentally an ABBA deadlock between queue freeze and the filesystem lock, exposed by preemption creating an artificial hold on q_usage_counter during the plug flush. Fix by disabling preemption around blk_flush_plug() in sched_submit_work(). The _notrace variants are used since this runs in scheduler context. preempt_enable_no_resched_notrace() is correct because we are already inside __schedule() and about to pick the next task. Fixes: 73c1010 ("block: initial patch for on-stack per-task plugging") Reported-by: Michael Wu <[email protected]> Tested-by: Michael Wu <[email protected]> Link: https://lore.kernel.org/linux-block/[email protected]/ Signed-off-by: Ming Lei <[email protected]>
1 parent 59ca59b commit 1250dc6

1 file changed

Lines changed: 2 additions & 0 deletions

File tree

kernel/sched/core.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7243,7 +7243,9 @@ static inline void sched_submit_work(struct task_struct *tsk)
72437243
* If we are going to sleep and we have plugged IO queued,
72447244
* make sure to submit it to avoid deadlocks.
72457245
*/
7246+
preempt_disable_notrace();
72467247
blk_flush_plug(tsk->plug, true);
7248+
preempt_enable_no_resched_notrace();
72477249

72487250
lock_map_release(&sched_map);
72497251
}

0 commit comments

Comments
 (0)