Conversation
|
Upstream branch: 8c2e52e |
|
Upstream branch: 8c2e52e |
151a4af to
2a5ce91
Compare
ac8b12e to
9a69d4f
Compare
|
Upstream branch: bc9ff19 |
2a5ce91 to
0d14bf9
Compare
9a69d4f to
e311dd9
Compare
|
Upstream branch: bc9ff19 |
0d14bf9 to
e7ecf2e
Compare
e311dd9 to
b6b569e
Compare
|
Upstream branch: bc9ff19 |
e7ecf2e to
5cf5173
Compare
b6b569e to
ef2c9cd
Compare
|
Upstream branch: 40f92e7 |
5cf5173 to
cd135b9
Compare
ef2c9cd to
198825c
Compare
|
Upstream branch: 40f92e7 |
cd135b9 to
925c604
Compare
198825c to
341e7ed
Compare
341e7ed to
81f31a4
Compare
|
Upstream branch: 89be9a8 |
925c604 to
b62de45
Compare
81f31a4 to
87bbbbc
Compare
Drop various unused #include statements. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Joanne Koong <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]>
Add inode and wpc fields to pass the inode and writeback context that are needed in the entire writeback call chain, and let the callers initialize all fields in the writeback context before calling iomap_writepages to simplify the argument passing. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Joanne Koong <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]>
…blocks We don't care about the count of outstanding ioends, just if there is one. Replace the count variable passed to iomap_writepage_map_blocks with a boolean to make that more clear. Signed-off-by: Joanne Koong <[email protected]> [hch: rename the variable, update the commit message] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]>
06c64ac to
ed286d0
Compare
Without the change `perf `hangs up on charaster devices. On my system
it's enough to run system-wide sampler for a few seconds to get the
hangup:
$ perf record -a -g --call-graph=dwarf
$ perf report
# hung
`strace` shows that hangup happens on reading on a character device
`/dev/dri/renderD128`
$ strace -y -f -p 2780484
strace: Process 2780484 attached
pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached
It's call trace descends into `elfutils`:
$ gdb -p 2780484
(gdb) bt
#0 0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0)
at ../sysdeps/unix/sysv/linux/pread64.c:25
#1 0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1
#2 0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#3 0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#4 0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 ()
from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#5 0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0)
at util/dso.h:537
#6 0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114
#7 frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242
#8 0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#9 0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0,
thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127,
best_effort=best_effort@entry=false) at util/thread.h:152
#13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0,
sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939
#14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540,
max_stack=127, symbols=true) at util/machine.c:2920
#15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440,
sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true)
at util/machine.c:2970
#16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440,
sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198
#17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0,
evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127
#18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127,
arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255
#19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540,
evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334
#20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0,
file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367
#21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245
#22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324
#23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224,
file_path=0x105ffbf0 "perf.data") at util/session.c:1419
#24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0,
--Type <RET> for more, q to quit, c to continue without paging--
quit
prog=prog@entry=0x7fff9df81220) at util/session.c:2132
#25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220)
at util/session.c:2181
#26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226
#27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390
#28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076
#29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827
#30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0)
at perf.c:351
#31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404
#32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448
#33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556
The hangup happens because nothing in` perf` or `elfutils` checks if a
mapped file is easily readable.
The change conservatively skips all non-regular files.
Signed-off-by: Sergei Trofimovich <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
bb733b3 to
40fbfea
Compare
1356209 to
ae9bce3
Compare
|
At least one diff in series https://patchwork.kernel.org/project/linux-block/list/?series=980069 irrelevant now for [{'archived': False, 'project': 241}] search patterns |
…l_access() Function copy_from_user() and copy_to_user() may sleep because of page fault, and they cannot be called in spin_lock hold context. Here move function calling of copy_from_user() and copy_to_user() before spinlock context in function kvm_eiointc_ctrl_access(). Otherwise there will be possible warning such as: BUG: sleeping function called from invalid context at include/linux/uaccess.h:192 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full) Tainted: [W]=WARN Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000 9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8 9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001 0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880 00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe 000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0 0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000 0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0 0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40 00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d Call Trace: [<9000000004c2827c>] show_stack+0x5c/0x180 [<9000000004c20fac>] dump_stack_lvl+0x94/0xe4 [<9000000004c99c7c>] __might_resched+0x26c/0x290 [<9000000004f68968>] __might_fault+0x20/0x88 [<ffff800002311de0>] kvm_eiointc_ctrl_access.isra.0+0x88/0x380 [kvm] [<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm] [<900000000506b0d8>] sys_ioctl+0x388/0x1010 [<90000000063ed210>] do_syscall+0xb0/0x2d8 [<9000000004c25ef8>] handle_syscall+0xb8/0x158 Cc: [email protected] Fixes: 1ad7efa ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
…s_access() Function copy_from_user() and copy_to_user() may sleep because of page fault, and they cannot be called in spin_lock hold context. Here move function calling of copy_from_user() and copy_to_user() before spinlock context in function kvm_eiointc_ctrl_access(). Otherwise there will be possible warning such as: BUG: sleeping function called from invalid context at include/linux/uaccess.h:192 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full) Tainted: [W]=WARN Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000 9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8 9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001 0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880 00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe 000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0 0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000 0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0 0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40 00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d Call Trace: [<9000000004c2827c>] show_stack+0x5c/0x180 [<9000000004c20fac>] dump_stack_lvl+0x94/0xe4 [<9000000004c99c7c>] __might_resched+0x26c/0x290 [<9000000004f68968>] __might_fault+0x20/0x88 [<ffff800002311de0>] kvm_eiointc_regs_access.isra.0+0x88/0x380 [kvm] [<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm] [<900000000506b0d8>] sys_ioctl+0x388/0x1010 [<90000000063ed210>] do_syscall+0xb0/0x2d8 [<9000000004c25ef8>] handle_syscall+0xb8/0x158 Cc: [email protected] Fixes: 1ad7efa ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
…status_access() Function copy_from_user() and copy_to_user() may sleep because of page fault, and they cannot be called in spin_lock hold context. Here move funtcion calling of copy_from_user() and copy_to_user() out of function kvm_eiointc_sw_status_access(). Otherwise there will be possible warning such as: BUG: sleeping function called from invalid context at include/linux/uaccess.h:192 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full) Tainted: [W]=WARN Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000 9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8 9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001 0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880 00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe 000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0 0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000 0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0 0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40 00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d Call Trace: [<9000000004c2827c>] show_stack+0x5c/0x180 [<9000000004c20fac>] dump_stack_lvl+0x94/0xe4 [<9000000004c99c7c>] __might_resched+0x26c/0x290 [<9000000004f68968>] __might_fault+0x20/0x88 [<ffff800002311de0>] kvm_eiointc_sw_status_access.isra.0+0x88/0x380 [kvm] [<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm] [<900000000506b0d8>] sys_ioctl+0x388/0x1010 [<90000000063ed210>] do_syscall+0xb0/0x2d8 [<9000000004c25ef8>] handle_syscall+0xb8/0x158 Cc: [email protected] Fixes: 1ad7efa ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
…s_access() Function copy_from_user() and copy_to_user() may sleep because of page fault, and they cannot be called in spin_lock hold context. Here move function calling of copy_from_user() and copy_to_user() out of spinlock context in function kvm_pch_pic_regs_access(). Otherwise there will be possible warning such as: BUG: sleeping function called from invalid context at include/linux/uaccess.h:192 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6292, name: qemu-system-loo preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last enabled at (0): [<9000000004c4a554>] copy_process+0x90c/0x1d40 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 41 UID: 0 PID: 6292 Comm: qemu-system-loo Tainted: G W 6.17.0-rc3+ #31 PREEMPT(full) Tainted: [W]=WARN Stack : 0000000000000076 0000000000000000 9000000004c28264 9000100092ff4000 9000100092ff7b80 9000100092ff7b88 0000000000000000 9000100092ff7cc8 9000100092ff7cc0 9000100092ff7cc0 9000100092ff7a00 0000000000000001 0000000000000001 9000100092ff7b88 947d2f9216a5e8b9 900010008773d880 00000000ffff8b9f fffffffffffffffe 0000000000000ba1 fffffffffffffffe 000000000000003e 900000000825a15b 000010007ad38000 9000100092ff7ec0 0000000000000000 0000000000000000 9000000006f3ac60 9000000007252000 0000000000000000 00007ff746ff2230 0000000000000053 9000200088a021b0 0000555556c9d190 0000000000000000 9000000004c2827c 000055556cfb5f40 00000000000000b0 0000000000000007 0000000000000007 0000000000071c1d Call Trace: [<9000000004c2827c>] show_stack+0x5c/0x180 [<9000000004c20fac>] dump_stack_lvl+0x94/0xe4 [<9000000004c99c7c>] __might_resched+0x26c/0x290 [<9000000004f68968>] __might_fault+0x20/0x88 [<ffff800002311de0>] kvm_pch_pic_regs_access.isra.0+0x88/0x380 [kvm] [<ffff8000022f8514>] kvm_device_ioctl+0x194/0x290 [kvm] [<900000000506b0d8>] sys_ioctl+0x388/0x1010 [<90000000063ed210>] do_syscall+0xb0/0x2d8 [<9000000004c25ef8>] handle_syscall+0xb8/0x158 Cc: [email protected] Fixes: d206d95 ("LoongArch: KVM: Add PCHPIC user mode read and write functions") Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
commit 2a6c727 ("cpufreq: Initialize cpufreq-based frequency-invariance later") postponed the frequency invariance initialization to avoid disabling it in the error case. This isn't locking safe, instead move the initialization up before the subsys interface is registered (which will rebuild the sched_domains) and add the corresponding disable on the error path. Observed lockdep without this patch: [ 0.989686] ====================================================== [ 0.989688] WARNING: possible circular locking dependency detected [ 0.989690] 6.17.0-rc4-cix-build+ #31 Tainted: G S [ 0.989691] ------------------------------------------------------ [ 0.989692] swapper/0/1 is trying to acquire lock: [ 0.989693] ffff800082ada7f8 (sched_energy_mutex){+.+.}-{4:4}, at: rebuild_sched_domains_energy+0x30/0x58 [ 0.989705] but task is already holding lock: [ 0.989706] ffff000088c89bc8 (&policy->rwsem){+.+.}-{4:4}, at: cpufreq_online+0x7f8/0xbe0 [ 0.989713] which lock already depends on the new lock. Fixes: 2a6c727 ("cpufreq: Initialize cpufreq-based frequency-invariance later") Signed-off-by: Christian Loehle <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
When switching IO schedulers on a block device, blkcg_activate_policy() can race with concurrent blkcg deletion, leading to a use-after-free of the blkg. T1: T2: elv_iosched_store blkg_destroy elevator_switch kill(&blkg->refcnt) // blkg->refcnt=0 ... blkg_release // call_rcu blkcg_activate_policy __blkg_release list for blkg blkg_free blkg_free_workfn ->pd_free_fn(pd) blkg_get(blkg) // blkg->refcnt=0->1 list_del_init(&blkg->q_node) kfree(blkg) blkg_put(pinned_blkg) // blkg->refcnt=1->0 blkg_release // call_rcu again call_rcu(..., __blkg_release) Fix this by replacing blkg_get() with blkg_tryget(), which fails if the blkg's refcount has already reached zero. If blkg_tryget() fails, skip processing this blkg since it's already being destroyed. The uaf call trace is as follows: ================================================================== BUG: KASAN: slab-use-after-free in rcu_accelerate_cbs+0x114/0x120 Read of size 8 at addr ffff88815a20b5d8 by task bash/1068 CPU: 0 PID: 1068 Comm: bash Not tainted 6.6.0-g6918ead378dc-dirty #31 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <IRQ> rcu_accelerate_cbs+0x114/0x120 rcu_report_qs_rdp+0x1fb/0x3e0 rcu_core+0x4d7/0x6f0 handle_softirqs+0x198/0x550 irq_exit_rcu+0x130/0x190 sysvec_apic_timer_interrupt+0x6e/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 Allocated by task 1031: kasan_save_stack+0x1c/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x8b/0x90 blkg_alloc+0xb6/0x9c0 blkg_create+0x8c6/0x1010 blkg_lookup_create+0x2ca/0x660 bio_associate_blkg_from_css+0xfb/0x4e0 bio_associate_blkg+0x62/0xf0 bio_init+0x272/0x8d0 bio_alloc_bioset+0x45a/0x760 ext4_bio_write_folio+0x68e/0x10d0 mpage_submit_folio+0x14a/0x2b0 mpage_process_page_bufs+0x1b1/0x390 mpage_prepare_extent_to_map+0xa91/0x1060 ext4_do_writepages+0x948/0x1c50 ext4_writepages+0x23f/0x4a0 do_writepages+0x162/0x5e0 filemap_fdatawrite_wbc+0x11a/0x180 __filemap_fdatawrite_range+0x9d/0xd0 file_write_and_wait_range+0x91/0x110 ext4_sync_file+0x1c1/0xaa0 __x64_sys_fsync+0x55/0x90 do_syscall_64+0x55/0x100 entry_SYSCALL_64_after_hwframe+0x78/0xe2 Freed by task 24: kasan_save_stack+0x1c/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x27/0x40 __kasan_slab_free+0x106/0x180 __kmem_cache_free+0x162/0x350 process_one_work+0x573/0xd30 worker_thread+0x67f/0xc30 kthread+0x28b/0x350 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1b/0x30 Fixes: f1c006f ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()") Signed-off-by: Zheng Qixing <[email protected]>
When switching IO schedulers on a block device, blkcg_activate_policy() can race with concurrent blkcg deletion, leading to a use-after-free of the blkg. T1: T2: elv_iosched_store blkg_destroy elevator_switch kill(&blkg->refcnt) // blkg->refcnt=0 ... blkg_release // call_rcu blkcg_activate_policy __blkg_release list for blkg blkg_free blkg_free_workfn ->pd_free_fn(pd) blkg_get(blkg) // blkg->refcnt=0->1 list_del_init(&blkg->q_node) kfree(blkg) blkg_put(pinned_blkg) // blkg->refcnt=1->0 blkg_release // call_rcu again call_rcu(..., __blkg_release) Fix this by replacing blkg_get() with blkg_tryget(), which fails if the blkg's refcount has already reached zero. If blkg_tryget() fails, skip processing this blkg since it's already being destroyed. The uaf call trace is as follows: ================================================================== BUG: KASAN: slab-use-after-free in rcu_accelerate_cbs+0x114/0x120 Read of size 8 at addr ffff88815a20b5d8 by task bash/1068 CPU: 0 PID: 1068 Comm: bash Not tainted 6.6.0-g6918ead378dc-dirty #31 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <IRQ> rcu_accelerate_cbs+0x114/0x120 rcu_report_qs_rdp+0x1fb/0x3e0 rcu_core+0x4d7/0x6f0 handle_softirqs+0x198/0x550 irq_exit_rcu+0x130/0x190 sysvec_apic_timer_interrupt+0x6e/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 Allocated by task 1031: kasan_save_stack+0x1c/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x8b/0x90 blkg_alloc+0xb6/0x9c0 blkg_create+0x8c6/0x1010 blkg_lookup_create+0x2ca/0x660 bio_associate_blkg_from_css+0xfb/0x4e0 bio_associate_blkg+0x62/0xf0 bio_init+0x272/0x8d0 bio_alloc_bioset+0x45a/0x760 ext4_bio_write_folio+0x68e/0x10d0 mpage_submit_folio+0x14a/0x2b0 mpage_process_page_bufs+0x1b1/0x390 mpage_prepare_extent_to_map+0xa91/0x1060 ext4_do_writepages+0x948/0x1c50 ext4_writepages+0x23f/0x4a0 do_writepages+0x162/0x5e0 filemap_fdatawrite_wbc+0x11a/0x180 __filemap_fdatawrite_range+0x9d/0xd0 file_write_and_wait_range+0x91/0x110 ext4_sync_file+0x1c1/0xaa0 __x64_sys_fsync+0x55/0x90 do_syscall_64+0x55/0x100 entry_SYSCALL_64_after_hwframe+0x78/0xe2 Freed by task 24: kasan_save_stack+0x1c/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x27/0x40 __kasan_slab_free+0x106/0x180 __kmem_cache_free+0x162/0x350 process_one_work+0x573/0xd30 worker_thread+0x67f/0xc30 kthread+0x28b/0x350 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1b/0x30 Fixes: f1c006f ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()") Signed-off-by: Zheng Qixing <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
When deassigning a KVM_IRQFD, don't clobber the irqfd's copy of the IRQ's routing entry as doing so breaks kvm_arch_irq_bypass_del_producer() on x86 and arm64, which explicitly look for KVM_IRQ_ROUTING_MSI. Instead, to handle a concurrent routing update, verify that the irqfd is still active before consuming the routing information. As evidenced by the x86 and arm64 bugs, and another bug in kvm_arch_update_irqfd_routing() (see below), clobbering the entry type without notifying arch code is surprising and error prone. As a bonus, checking that the irqfd is active provides a convenient location for documenting _why_ KVM must not consume the routing entry for an irqfd that is in the process of being deassigned: once the irqfd is deleted from the list (which happens *before* the eventfd is detached), it will no longer receive updates via kvm_irq_routing_update(), and so KVM could deliver an event using stale routing information (relative to KVM_SET_GSI_ROUTING returning to userspace). As an even better bonus, explicitly checking for the irqfd being active fixes a similar bug to the one the clobbering is trying to prevent: if an irqfd is deactivated, and then its routing is changed, kvm_irq_routing_update() won't invoke kvm_arch_update_irqfd_routing() (because the irqfd isn't in the list). And so if the irqfd is in bypass mode, IRQs will continue to be posted using the old routing information. As for kvm_arch_irq_bypass_del_producer(), clobbering the routing type results in KVM incorrectly keeping the IRQ in bypass mode, which is especially problematic on AMD as KVM tracks IRQs that are being posted to a vCPU in a list whose lifetime is tied to the irqfd. Without the help of KASAN to detect use-after-free, the most common sympton on AMD is a NULL pointer deref in amd_iommu_update_ga() due to the memory for irqfd structure being re-allocated and zeroed, resulting in irqfd->irq_bypass_data being NULL when read by avic_update_iommu_vcpu_affinity(): BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 40cf2b9067 P4D 40cf2b9067 PUD 408362a067 PMD 0 Oops: Oops: 0000 [#1] SMP CPU: 6 UID: 0 PID: 40383 Comm: vfio_irq_test Tainted: G U W O 6.19.0-smp--5dddc257e6b2-irqfd #31 NONE Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.78.2-0 09/05/2025 RIP: 0010:amd_iommu_update_ga+0x19/0xe0 Call Trace: <TASK> avic_update_iommu_vcpu_affinity+0x3d/0x90 [kvm_amd] __avic_vcpu_load+0xf4/0x130 [kvm_amd] kvm_arch_vcpu_load+0x89/0x210 [kvm] vcpu_load+0x30/0x40 [kvm] kvm_arch_vcpu_ioctl_run+0x45/0x620 [kvm] kvm_vcpu_ioctl+0x571/0x6a0 [kvm] __se_sys_ioctl+0x6d/0xb0 do_syscall_64+0x6f/0x9d0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x46893b </TASK> ---[ end trace 0000000000000000 ]--- If AVIC is inhibited when the irfd is deassigned, the bug will manifest as list corruption, e.g. on the next irqfd assignment. list_add corruption. next->prev should be prev (ffff8d474d5cd588), but was 0000000000000000. (next=ffff8d8658f86530). ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:31! Oops: invalid opcode: 0000 [#1] SMP CPU: 128 UID: 0 PID: 80818 Comm: vfio_irq_test Tainted: G U W O 6.19.0-smp--f19dc4d680ba-irqfd #28 NONE Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.78.2-0 09/05/2025 RIP: 0010:__list_add_valid_or_report+0x97/0xc0 Call Trace: <TASK> avic_pi_update_irte+0x28e/0x2b0 [kvm_amd] kvm_pi_update_irte+0xbf/0x190 [kvm] kvm_arch_irq_bypass_add_producer+0x72/0x90 [kvm] irq_bypass_register_consumer+0xcd/0x170 [irqbypass] kvm_irqfd+0x4c6/0x540 [kvm] kvm_vm_ioctl+0x118/0x5d0 [kvm] __se_sys_ioctl+0x6d/0xb0 do_syscall_64+0x6f/0x9d0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> ---[ end trace 0000000000000000 ]--- On Intel and arm64, the bug is less noisy, as the end result is that the device keeps posting IRQs to the vCPU even after it's been deassigned. Note, the worst of the breakage can be traced back to commit cb21073 ("KVM: Pass new routing entries and irqfd when updating IRTEs"), as before that commit KVM would pull the routing information from the per-VM routing table. But as above, similar bugs have existed since support for IRQ bypass was added. E.g. if a routing change finished before irq_shutdown() invoked kvm_arch_irq_bypass_del_producer(), VMX and SVM would see stale routing information and potentially leave the irqfd in bypass mode. Alternatively, x86 could be fixed by explicitly checking irq_bypass_vcpu instead of irq_entry.type in kvm_arch_irq_bypass_del_producer(), and arm64 could be modified to utilize irq_bypass_vcpu in a similar manner. But (a) that wouldn't fix the routing updates bug, and (b) fixing core code doesn't preclude x86 (or arm64) from adding such code as a sanity check (spoiler alert). Fixes: f70c20a ("KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'") Fixes: cb21073 ("KVM: Pass new routing entries and irqfd when updating IRTEs") Fixes: a0d7e2f ("KVM: arm64: vgic-v4: Only attempt vLPI mapping for actual MSIs") Cc: [email protected] Cc: Marc Zyngier <[email protected]> Cc: Oliver Upton <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
…pt_rt' Jiayuan Chen says: ==================== bpf: Fix per-CPU bulk queue races on PREEMPT_RT On PREEMPT_RT kernels, local_bh_disable() only calls migrate_disable() (when PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable preemption. This means CFS scheduling can preempt a task inside the per-CPU bulk queue (bq) operations in cpumap and devmap, allowing another task on the same CPU to concurrently access the same bq, leading to use-after-free, list corruption, and kernel panics. Patch 1 fixes the cpumap race in bq_flush_to_queue(), originally reported by syzbot [1]. Patch 2 fixes the same class of race in devmap's bq_xmit_all(), identified by code inspection after Sebastian Andrzej Siewior pointed out that devmap has the same per-CPU bulk queue pattern [2]. Both patches use local_lock_nested_bh() to serialize access to the per-CPU bq. On non-RT this is a pure lockdep annotation with no overhead; on PREEMPT_RT it provides a per-CPU sleeping lock. To reproduce the devmap race, insert an mdelay(100) in bq_xmit_all() after "cnt = bq->count" and before the actual transmit loop. Then pin two threads to the same CPU, each running BPF_PROG_TEST_RUN with an XDP program that redirects to a DEVMAP entry (e.g. a veth pair). CFS timeslicing during the mdelay window causes interleaving. Without the fix, KASAN reports null-ptr-deref due to operating on freed frames: BUG: KASAN: null-ptr-deref in __build_skb_around+0x22d/0x340 Write of size 32 at addr 0000000000000d50 by task devmap_race_rep/449 CPU: 0 UID: 0 PID: 449 Comm: devmap_race_rep Not tainted 6.19.0+ #31 PREEMPT_RT Call Trace: <TASK> __build_skb_around+0x22d/0x340 build_skb_around+0x25/0x260 __xdp_build_skb_from_frame+0x103/0x860 veth_xdp_rcv_bulk_skb.isra.0+0x162/0x320 veth_xdp_rcv.constprop.0+0x61e/0xbb0 veth_poll+0x280/0xb50 __napi_poll.constprop.0+0xa5/0x590 net_rx_action+0x4b0/0xea0 handle_softirqs.isra.0+0x1b3/0x780 __local_bh_enable_ip+0x12a/0x240 xdp_test_run_batch.constprop.0+0xedd/0x1f60 bpf_test_run_xdp_live+0x304/0x640 bpf_prog_test_run_xdp+0xd24/0x1b70 __sys_bpf+0x61c/0x3e00 </TASK> Kernel panic - not syncing: Fatal exception in interrupt [1] https://lore.kernel.org/all/[email protected]/T/ [2] https://lore.kernel.org/bpf/[email protected]/ v3 -> v4: https://lore.kernel.org/all/[email protected]/ - Move panic trace to cover letter. (Sebastian Andrzej Siewior) - Add Reviewed-by: Sebastian Andrzej Siewior <[email protected]> to both patches from cover letter. v2 -> v3: https://lore.kernel.org/bpf/[email protected]/ - Fix commit message: remove incorrect "spin_lock() becomes rt_mutex" claim, the per-CPU bq has no spin_lock at all. (Sebastian Andrzej Siewior) - Fix commit message: accurately describe local_lock_nested_bh() behavior instead of referencing local_lock(). (Sebastian Andrzej Siewior) - Remove incomplete discussion of snapshot alternative. (Sebastian Andrzej Siewior) - Remove panic trace from commit message. (Sebastian Andrzej Siewior) - Add patch 2/2 for devmap, same race pattern. (Sebastian Andrzej Siewior) v1 -> v2: https://lore.kernel.org/bpf/[email protected]/ - Use local_lock_nested_bh()/local_unlock_nested_bh() instead of local_lock()/local_unlock(), since these paths already run under local_bh_disable(). (Sebastian Andrzej Siewior) - Replace "Caller must hold bq->bq_lock" comment with lockdep_assert_held() in bq_flush_to_queue(). (Sebastian Andrzej Siewior) - Fix Fixes tag to 3253cb4 ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT") which is the actual commit that makes the race possible. (Sebastian Andrzej Siewior) ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
Pull request for series with
subject: iomap: header diet
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=980069