block: move sched_tags allocation/de-allocation outside of locking context by blktests-ci[bot] · Pull Request #68 · linux-blktests/linux-block

blktests-ci · 2025-07-31T04:28:22Z

Pull request for series with
subject: block: move sched_tags allocation/de-allocation outside of locking context
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=986951

In preparation for allocating sched_tags before freezing the request queue and acquiring ->elevator_lock, move the elevator queue allocation logic from the elevator ops ->init_sched callback into blk_mq_init_sched. As elevator_alloc is now only invoked from block layer core, we don't need to export it, so unexport elevator_alloc function. This refactoring provides a centralized location for elevator queue initialization, which makes it easier to store pre-allocated sched_tags in the struct elevator_queue during later changes. Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Nilay Shroff <[email protected]>

…tore Recent lockdep reports [1] have revealed a potential deadlock caused by a lock dependency between the percpu allocator lock and the elevator lock. This issue can be avoided by ensuring that the allocation and release of scheduler tags (sched_tags) are performed outside the elevator lock. Furthermore, the queue does not need to be remain frozen during these operations. To address this, move all sched_tags allocations and deallocations outside of both the ->elevator_lock and the ->freeze_lock. Since the lifetime of the elevator queue and its associated sched_tags is closely tied, the allocated sched_tags are now stored in the elevator queue structure. Then, during the actual elevator switch (which runs under ->freeze_lock and ->elevator_lock), the pre-allocated sched_tags are assigned to the appropriate q->hctx. Once the elevator switch is complete and the locks are released, the old elevator queue and its associated sched_tags are freed. This commit specifically addresses the allocation/deallocation of sched_ tags during elevator switching. Note that sched_tags may also be allocated in other contexts, such as during nr_hw_queues updates. Supporting that use case will require batch allocation/deallocation, which will be handled in a follow-up patch. This restructuring ensures that sched_tags memory management occurs entirely outside of the ->elevator_lock and ->freeze_lock context, eliminating the lock dependency problem seen during scheduler updates. [1] https://lore.kernel.org/all/[email protected]/ Reported-by: Stefan Haberland <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Nilay Shroff <[email protected]>

Move scheduler tags (sched_tags) allocation and deallocation outside both the ->elevator_lock and ->freeze_lock when updating nr_hw_queues. This change breaks the dependency chain from the percpu allocator lock to the elevator lock, helping to prevent potential deadlocks, as observed in the reported lockdep splat[1]. This commit introduces batch allocation and deallocation helpers for sched_tags, which are now used from within __blk_mq_update_nr_hw_queues routine while iterating through the tagset. With this change, all sched_tags memory management is handled entirely outside the ->elevator_lock and the ->freeze_lock context, thereby eliminating the lock dependency that could otherwise manifest during nr_hw_queues updates. [1] https://lore.kernel.org/all/[email protected]/ Reported-by: Stefan Haberland <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Nilay Shroff <[email protected]>

blktests-ci · 2025-07-31T04:28:23Z

Upstream branch: 260f6f4
series: https://patchwork.kernel.org/project/linux-block/list/?series=986951
version: 1

blktests-ci · 2025-07-31T09:04:08Z

Upstream branch: 260f6f4
series: https://patchwork.kernel.org/project/linux-block/list/?series=986951
version: 1

blktests-ci · 2025-07-31T09:12:03Z

Upstream branch: 260f6f4
series: https://patchwork.kernel.org/project/linux-block/list/?series=986951
version: 1

blktests-ci · 2025-07-31T09:14:51Z

Github failed to update this PR after force push. Close it.

netlink_attachskb() checks for the socket's read memory allocation constraints. Firstly, it has: rmem < READ_ONCE(sk->sk_rcvbuf) to check if the just increased rmem value fits into the socket's receive buffer. If not, it proceeds and tries to wait for the memory under: rmem + skb->truesize > READ_ONCE(sk->sk_rcvbuf) The checks don't cover the case when skb->truesize + sk->sk_rmem_alloc is equal to sk->sk_rcvbuf. Thus the function neither successfully accepts these conditions, nor manages to reschedule the task - and is called in retry loop for indefinite time which is caught as: rcu: INFO: rcu_sched self-detected stall on CPU rcu: 0-....: (25999 ticks this GP) idle=ef2/1/0x4000000000000000 softirq=262269/262269 fqs=6212 (t=26000 jiffies g=230833 q=259957) NMI backtrace for cpu 0 CPU: 0 PID: 22 Comm: kauditd Not tainted 5.10.240 #68 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc42 04/01/2014 Call Trace: <IRQ> dump_stack lib/dump_stack.c:120 nmi_cpu_backtrace.cold lib/nmi_backtrace.c:105 nmi_trigger_cpumask_backtrace lib/nmi_backtrace.c:62 rcu_dump_cpu_stacks kernel/rcu/tree_stall.h:335 rcu_sched_clock_irq.cold kernel/rcu/tree.c:2590 update_process_times kernel/time/timer.c:1953 tick_sched_handle kernel/time/tick-sched.c:227 tick_sched_timer kernel/time/tick-sched.c:1399 __hrtimer_run_queues kernel/time/hrtimer.c:1652 hrtimer_interrupt kernel/time/hrtimer.c:1717 __sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113 asm_call_irq_on_stack arch/x86/entry/entry_64.S:808 </IRQ> netlink_attachskb net/netlink/af_netlink.c:1234 netlink_unicast net/netlink/af_netlink.c:1349 kauditd_send_queue kernel/audit.c:776 kauditd_thread kernel/audit.c:897 kthread kernel/kthread.c:328 ret_from_fork arch/x86/entry/entry_64.S:304 Restore the original behavior of the check which commit in Fixes accidentally missed when restructuring the code. Found by Linux Verification Center (linuxtesting.org). Fixes: ae8f160 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.") Cc: [email protected] Signed-off-by: Fedor Pchelkin <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

shroffni added 3 commits July 31, 2025 13:28

blktests-ci Bot added new V1 linus-master labels Jul 31, 2025

blktests-ci Bot closed this Jul 31, 2025

blktests-ci Bot deleted the series/977615=>linus-master branch August 2, 2025 00:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

block: move sched_tags allocation/de-allocation outside of locking context#68

block: move sched_tags allocation/de-allocation outside of locking context#68
blktests-ci[bot] wants to merge 3 commits intolinus-master_basefrom
series/977615=>linus-master

blktests-ci Bot commented Jul 31, 2025

Uh oh!

blktests-ci Bot commented Jul 31, 2025

Uh oh!

blktests-ci Bot commented Jul 31, 2025

Uh oh!

blktests-ci Bot commented Jul 31, 2025

Uh oh!

blktests-ci Bot commented Jul 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blktests-ci Bot commented Jul 31, 2025

Uh oh!

blktests-ci Bot commented Jul 31, 2025

Uh oh!

blktests-ci Bot commented Jul 31, 2025

Uh oh!

blktests-ci Bot commented Jul 31, 2025

Uh oh!

blktests-ci Bot commented Jul 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant