Skip to content

Commit 3d3667f

Browse files
committed
tools/sched_ext: Kick home CPU for stranded tasks in scx_qmap
scx_qmap uses global BPF queue maps (BPF_MAP_TYPE_QUEUE) that any CPU's ops.dispatch() can pop from. When a CPU pops a task that can't run on it (e.g. a pinned per-CPU kthread), it inserts the task into SHARED_DSQ. consume_dispatch_q() then skips the task due to affinity mismatch, leaving it stranded until some CPU in its allowed mask calls ops.dispatch(). This doesn't cause indefinite stalls -- the periodic tick keeps firing (can_stop_idle_tick() returns false when softirq is pending) -- but can cause noticeable scheduling delays. After inserting to SHARED_DSQ, kick the task's home CPU if this CPU can't run it. There's a small race window where the home CPU can enter idle before the kick lands -- if a per-CPU kthread like ksoftirqd is the stranded task, this can trigger a "NOHZ tick-stop error" warning. The kick arrives shortly after and the home CPU drains the task. Rather than fully eliminating the warning by routing pinned tasks to local or global DSQs, the current code keeps them going through the normal BPF queue path and documents the race and the resulting warning in detail. scx_qmap is an example scheduler and having tasks go through the usual dispatch path is useful for testing. The detailed comment also serves as a reference for other schedulers that may encounter similar warnings. Reviewed-by: Andrea Righi <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
1 parent 49d78ad commit 3d3667f

1 file changed

Lines changed: 40 additions & 0 deletions

File tree

tools/sched_ext/scx_qmap.bpf.c

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -471,6 +471,46 @@ void BPF_STRUCT_OPS(qmap_dispatch, s32 cpu, struct task_struct *prev)
471471
__sync_fetch_and_add(&nr_dispatched, 1);
472472

473473
scx_bpf_dsq_insert(p, SHARED_DSQ, slice_ns, 0);
474+
475+
/*
476+
* scx_qmap uses a global BPF queue that any CPU's
477+
* dispatch can pop from. If this CPU popped a task that
478+
* can't run here, it gets stranded on SHARED_DSQ after
479+
* consume_dispatch_q() skips it. Kick the task's home
480+
* CPU so it drains SHARED_DSQ.
481+
*
482+
* There's a race between the pop and the flush of the
483+
* buffered dsq_insert:
484+
*
485+
* CPU 0 (dispatching) CPU 1 (home, idle)
486+
* ~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
487+
* pop from BPF queue
488+
* dsq_insert(buffered)
489+
* balance:
490+
* SHARED_DSQ empty
491+
* BPF queue empty
492+
* -> goes idle
493+
* flush -> on SHARED
494+
* kick CPU 1
495+
* wakes, drains task
496+
*
497+
* The kick prevents indefinite stalls but a per-CPU
498+
* kthread like ksoftirqd can be briefly stranded when
499+
* its home CPU enters idle with softirq pending,
500+
* triggering:
501+
*
502+
* "NOHZ tick-stop error: local softirq work is pending, handler #N!!!"
503+
*
504+
* from report_idle_softirq(). The kick lands shortly
505+
* after and the home CPU drains the task. This could be
506+
* avoided by e.g. dispatching pinned tasks to local or
507+
* global DSQs, but the current code is left as-is to
508+
* document this class of issue -- other schedulers
509+
* seeing similar warnings can use this as a reference.
510+
*/
511+
if (!bpf_cpumask_test_cpu(cpu, p->cpus_ptr))
512+
scx_bpf_kick_cpu(scx_bpf_task_cpu(p), 0);
513+
474514
bpf_task_release(p);
475515

476516
batch--;

0 commit comments

Comments
 (0)