Skip to content

Commit c7f27a8

Browse files
liu-song-6htejun
authored andcommitted
workqueue: Fix false positive stall reports
On weakly ordered architectures (e.g., arm64), the lockless check in wq_watchdog_timer_fn() can observe a reordering between the worklist insertion and the last_progress_ts update. Specifically, the watchdog can see a non-empty worklist (from a list_add) while reading a stale last_progress_ts value, causing a false positive stall report. This was confirmed by reading pool->last_progress_ts again after holding pool->lock in wq_watchdog_timer_fn(): workqueue watchdog: pool 7 false positive detected! lockless_ts=4784580465 locked_ts=4785033728 diff=453263ms worklist_empty=0 To avoid slowing down the hot path (queue_work, etc.), recheck last_progress_ts with pool->lock held. This will eliminate the false positive with minimal overhead. Remove two extra empty lines in wq_watchdog_timer_fn() as we are on it. Fixes: 82607ad ("workqueue: implement lockup detector") Cc: [email protected] # v4.5+ Assisted-by: claude-code:claude-opus-4-6 Signed-off-by: Song Liu <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
1 parent 98c790b commit c7f27a8

1 file changed

Lines changed: 21 additions & 3 deletions

File tree

kernel/workqueue.c

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7699,8 +7699,28 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
76997699
else
77007700
ts = touched;
77017701

7702-
/* did we stall? */
7702+
/*
7703+
* Did we stall?
7704+
*
7705+
* Do a lockless check first. On weakly ordered
7706+
* architectures, the lockless check can observe a
7707+
* reordering between worklist insert_work() and
7708+
* last_progress_ts update from __queue_work(). Since
7709+
* __queue_work() is a much hotter path than the timer
7710+
* function, we handle false positive here by reading
7711+
* last_progress_ts again with pool->lock held.
7712+
*/
77037713
if (time_after(now, ts + thresh)) {
7714+
scoped_guard(raw_spinlock_irqsave, &pool->lock) {
7715+
pool_ts = pool->last_progress_ts;
7716+
if (time_after(pool_ts, touched))
7717+
ts = pool_ts;
7718+
else
7719+
ts = touched;
7720+
}
7721+
if (!time_after(now, ts + thresh))
7722+
continue;
7723+
77047724
lockup_detected = true;
77057725
stall_time = jiffies_to_msecs(now - pool_ts) / 1000;
77067726
max_stall_time = max(max_stall_time, stall_time);
@@ -7712,8 +7732,6 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
77127732
pr_cont_pool_info(pool);
77137733
pr_cont(" stuck for %us!\n", stall_time);
77147734
}
7715-
7716-
77177735
}
77187736

77197737
if (lockup_detected)

0 commit comments

Comments
 (0)