From 621292f81f063dff04c3d72c241e73edab8c479c Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Thu, 23 Apr 2026 20:55:28 +0800 Subject: [PATCH] sched: disable preemption around blk_flush_plug in sched_submit_work On preemptible kernels, a three-way deadlock can occur involving blk_mq_freeze_queue and blk_mq_dispatch_list: - Task A holds a filesystem lock (e.g., f2fs io_rwsem) and enters __bio_queue_enter(), waiting for mq_freeze_depth == 0 - Task B holds mq_freeze_depth=1 (elevator_change) and waits for q_usage_counter to reach zero in blk_mq_freeze_queue_wait() - Task C is going to sleep waiting for the filesystem lock. Before sleeping, schedule() calls sched_submit_work() -> blk_flush_plug() -> blk_mq_dispatch_list(), which acquires q_usage_counter via percpu_ref_get(). If Task C gets preempted before percpu_ref_put(), it will not be scheduled back because the task is already in uninterruptible sleep state (TASK_UNINTERRUPTIBLE). This means it holds the percpu_ref indefinitely, preventing freeze from completing. This is fundamentally an ABBA deadlock between queue freeze and the filesystem lock, exposed by preemption creating an artificial hold on q_usage_counter during the plug flush. Fix by disabling preemption around blk_flush_plug() in sched_submit_work(). The _notrace variants are used since this runs in scheduler context. preempt_enable_no_resched_notrace() is correct because we are already inside __schedule() and about to pick the next task. Fixes: 73c101011926 ("block: initial patch for on-stack per-task plugging") Reported-by: Michael Wu Tested-by: Michael Wu Link: https://lore.kernel.org/linux-block/20260417082744.30124-1-michael@allwinnertech.com/ Signed-off-by: Ming Lei --- kernel/sched/core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index da20fb6ea25a..1d7ae6ae85f8 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7243,7 +7243,9 @@ static inline void sched_submit_work(struct task_struct *tsk) * If we are going to sleep and we have plugged IO queued, * make sure to submit it to avoid deadlocks. */ + preempt_disable_notrace(); blk_flush_plug(tsk->plug, true); + preempt_enable_no_resched_notrace(); lock_map_release(&sched_map); }