Block fixes and updates for 6.16-rc1 by blktests-ci[bot] · Pull Request #10 · linux-blktests/linux-block

blktests-ci · 2025-06-10T04:37:44Z

Pull request for series with
subject: Block fixes and updates for 6.16-rc1
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=969282

* io_uring-6.16: MAINTAINERS: remove myself from io_uring io_uring/net: only consider msg_inq if larger than 1 io_uring/zcrx: fix area release on registration failure io_uring/zcrx: init id for xa_find

* block-6.16: selftests: ublk: cover PER_IO_DAEMON in more stress tests Documentation: ublk: document UBLK_F_PER_IO_DAEMON selftests: ublk: add stress test for per io daemons selftests: ublk: add functional test for per io daemons selftests: ublk: kublk: decouple ublk_queues from ublk server threads selftests: ublk: kublk: move per-thread data out of ublk_queue selftests: ublk: kublk: lift queue initialization out of thread selftests: ublk: kublk: tie sqe allocation to io instead of queue selftests: ublk: kublk: plumb q_id in io_uring user_data ublk: have a per-io daemon instead of a per-queue daemon md/md-bitmap: remove parameter slot from bitmap_create() md/md-bitmap: cleanup bitmap_ops->startwrite() md/dm-raid: remove max_write_behind setting limit md/md-bitmap: fix dm-raid max_write_behind setting md/raid1,raid10: don't handle IO error for REQ_RAHEAD and REQ_NOWAIT loop: add file_start_write() and file_end_write() bcache: reserve more RESERVE_BTREE buckets to prevent allocator hang bcache: remove unused constants bcache: fix NULL pointer in cache_set_flush()

* io_uring-6.16: io_uring/kbuf: limit legacy provided buffer lists to USHRT_MAX

* block-6.16: block: drop direction param from bio_integrity_copy_user()

* block-6.16: selftests: ublk: kublk: improve behavior on init failure block: flip iter directions in blk_rq_integrity_map_user()

* io_uring-6.16: io_uring/futex: mark wait requests as inflight io_uring/futex: get rid of struct io_futex addr union

* block-6.16: nvme: spelling fixes nvme-tcp: fix I/O stalls on congested sockets nvme-tcp: sanitize request list handling nvme-tcp: remove tag set when second admin queue config fails nvme: enable vectored registered bufs for passthrough cmds nvme: fix implicit bool to flags conversion nvme: fix command limits status code

blktests-ci · 2025-06-10T04:37:46Z

Upstream branch: 38f4878
series: https://patchwork.kernel.org/project/linux-block/list/?series=969282
version: 1

Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/linux-block/list/?series=969282
error message:

Cmd('git') failed due to: exit code(128)
  cmdline: git am --3way
  stdout: 'Patch is empty.'
  stderr: 'hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To record the empty patch as an empty commit, run "git am --allow-empty".
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"'

conflict:

blktests-ci · 2025-07-10T12:02:39Z

At least one diff in series https://patchwork.kernel.org/project/linux-block/list/?series=969282 irrelevant now for [{'archived': False, 'project': 241}] search patterns

The hfs_find_init() method can trigger the crash if tree pointer is NULL: [ 45.746290][ T9787] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000008: 0000 [#1] SMP KAI [ 45.747287][ T9787] KASAN: null-ptr-deref in range [0x0000000000000040-0x0000000000000047] [ 45.748716][ T9787] CPU: 2 UID: 0 PID: 9787 Comm: repro Not tainted 6.16.0-rc3 #10 PREEMPT(full) [ 45.750250][ T9787] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 45.751983][ T9787] RIP: 0010:hfs_find_init+0x86/0x230 [ 45.752834][ T9787] Code: c1 ea 03 80 3c 02 00 0f 85 9a 01 00 00 4c 8d 6b 40 48 c7 45 18 00 00 00 00 48 b8 00 00 00 00 00 fc [ 45.755574][ T9787] RSP: 0018:ffffc90015157668 EFLAGS: 00010202 [ 45.756432][ T9787] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff819a4d09 [ 45.757457][ T9787] RDX: 0000000000000008 RSI: ffffffff819acd3a RDI: ffffc900151576e8 [ 45.758282][ T9787] RBP: ffffc900151576d0 R08: 0000000000000005 R09: 0000000000000000 [ 45.758943][ T9787] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000004 [ 45.759619][ T9787] R13: 0000000000000040 R14: ffff88802c50814a R15: 0000000000000000 [ 45.760293][ T9787] FS: 00007ffb72734540(0000) GS:ffff8880cec64000(0000) knlGS:0000000000000000 [ 45.761050][ T9787] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.761606][ T9787] CR2: 00007f9bd8225000 CR3: 000000010979a000 CR4: 00000000000006f0 [ 45.762286][ T9787] Call Trace: [ 45.762570][ T9787] <TASK> [ 45.762824][ T9787] hfs_ext_read_extent+0x190/0x9d0 [ 45.763269][ T9787] ? submit_bio_noacct_nocheck+0x2dd/0xce0 [ 45.763766][ T9787] ? __pfx_hfs_ext_read_extent+0x10/0x10 [ 45.764250][ T9787] hfs_get_block+0x55f/0x830 [ 45.764646][ T9787] block_read_full_folio+0x36d/0x850 [ 45.765105][ T9787] ? __pfx_hfs_get_block+0x10/0x10 [ 45.765541][ T9787] ? const_folio_flags+0x5b/0x100 [ 45.765972][ T9787] ? __pfx_hfs_read_folio+0x10/0x10 [ 45.766415][ T9787] filemap_read_folio+0xbe/0x290 [ 45.766840][ T9787] ? __pfx_filemap_read_folio+0x10/0x10 [ 45.767325][ T9787] ? __filemap_get_folio+0x32b/0xbf0 [ 45.767780][ T9787] do_read_cache_folio+0x263/0x5c0 [ 45.768223][ T9787] ? __pfx_hfs_read_folio+0x10/0x10 [ 45.768666][ T9787] read_cache_page+0x5b/0x160 [ 45.769070][ T9787] hfs_btree_open+0x491/0x1740 [ 45.769481][ T9787] hfs_mdb_get+0x15e2/0x1fb0 [ 45.769877][ T9787] ? __pfx_hfs_mdb_get+0x10/0x10 [ 45.770316][ T9787] ? find_held_lock+0x2b/0x80 [ 45.770731][ T9787] ? lockdep_init_map_type+0x5c/0x280 [ 45.771200][ T9787] ? lockdep_init_map_type+0x5c/0x280 [ 45.771674][ T9787] hfs_fill_super+0x38e/0x720 [ 45.772092][ T9787] ? __pfx_hfs_fill_super+0x10/0x10 [ 45.772549][ T9787] ? snprintf+0xbe/0x100 [ 45.772931][ T9787] ? __pfx_snprintf+0x10/0x10 [ 45.773350][ T9787] ? do_raw_spin_lock+0x129/0x2b0 [ 45.773796][ T9787] ? find_held_lock+0x2b/0x80 [ 45.774215][ T9787] ? set_blocksize+0x40a/0x510 [ 45.774636][ T9787] ? sb_set_blocksize+0x176/0x1d0 [ 45.775087][ T9787] ? setup_bdev_super+0x369/0x730 [ 45.775533][ T9787] get_tree_bdev_flags+0x384/0x620 [ 45.775985][ T9787] ? __pfx_hfs_fill_super+0x10/0x10 [ 45.776453][ T9787] ? __pfx_get_tree_bdev_flags+0x10/0x10 [ 45.776950][ T9787] ? bpf_lsm_capable+0x9/0x10 [ 45.777365][ T9787] ? security_capable+0x80/0x260 [ 45.777803][ T9787] vfs_get_tree+0x8e/0x340 [ 45.778203][ T9787] path_mount+0x13de/0x2010 [ 45.778604][ T9787] ? kmem_cache_free+0x2b0/0x4c0 [ 45.779052][ T9787] ? __pfx_path_mount+0x10/0x10 [ 45.779480][ T9787] ? getname_flags.part.0+0x1c5/0x550 [ 45.779954][ T9787] ? putname+0x154/0x1a0 [ 45.780335][ T9787] __x64_sys_mount+0x27b/0x300 [ 45.780758][ T9787] ? __pfx___x64_sys_mount+0x10/0x10 [ 45.781232][ T9787] do_syscall_64+0xc9/0x480 [ 45.781631][ T9787] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 45.782149][ T9787] RIP: 0033:0x7ffb7265b6ca [ 45.782539][ T9787] Code: 48 8b 0d c9 17 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 [ 45.784212][ T9787] RSP: 002b:00007ffc0c10cfb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 45.784935][ T9787] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffb7265b6ca [ 45.785626][ T9787] RDX: 0000200000000240 RSI: 0000200000000280 RDI: 00007ffc0c10d100 [ 45.786316][ T9787] RBP: 00007ffc0c10d190 R08: 00007ffc0c10d000 R09: 0000000000000000 [ 45.787011][ T9787] R10: 0000000000000048 R11: 0000000000000206 R12: 0000560246733250 [ 45.787697][ T9787] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 45.788393][ T9787] </TASK> [ 45.788665][ T9787] Modules linked in: [ 45.789058][ T9787] ---[ end trace 0000000000000000 ]--- [ 45.789554][ T9787] RIP: 0010:hfs_find_init+0x86/0x230 [ 45.790028][ T9787] Code: c1 ea 03 80 3c 02 00 0f 85 9a 01 00 00 4c 8d 6b 40 48 c7 45 18 00 00 00 00 48 b8 00 00 00 00 00 fc [ 45.792364][ T9787] RSP: 0018:ffffc90015157668 EFLAGS: 00010202 [ 45.793155][ T9787] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff819a4d09 [ 45.794123][ T9787] RDX: 0000000000000008 RSI: ffffffff819acd3a RDI: ffffc900151576e8 [ 45.795105][ T9787] RBP: ffffc900151576d0 R08: 0000000000000005 R09: 0000000000000000 [ 45.796135][ T9787] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000004 [ 45.797114][ T9787] R13: 0000000000000040 R14: ffff88802c50814a R15: 0000000000000000 [ 45.798024][ T9787] FS: 00007ffb72734540(0000) GS:ffff8880cec64000(0000) knlGS:0000000000000000 [ 45.799019][ T9787] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.799822][ T9787] CR2: 00007f9bd8225000 CR3: 000000010979a000 CR4: 00000000000006f0 [ 45.800747][ T9787] Kernel panic - not syncing: Fatal exception The hfs_fill_super() calls hfs_mdb_get() method that tries to construct Extents Tree and Catalog Tree: HFS_SB(sb)->ext_tree = hfs_btree_open(sb, HFS_EXT_CNID, hfs_ext_keycmp); if (!HFS_SB(sb)->ext_tree) { pr_err("unable to open extent tree\n"); goto out; } HFS_SB(sb)->cat_tree = hfs_btree_open(sb, HFS_CAT_CNID, hfs_cat_keycmp); if (!HFS_SB(sb)->cat_tree) { pr_err("unable to open catalog tree\n"); goto out; } However, hfs_btree_open() calls read_mapping_page() that calls hfs_get_block(). And this method calls hfs_ext_read_extent(): static int hfs_ext_read_extent(struct inode *inode, u16 block) { struct hfs_find_data fd; int res; if (block >= HFS_I(inode)->cached_start && block < HFS_I(inode)->cached_start + HFS_I(inode)->cached_blocks) return 0; res = hfs_find_init(HFS_SB(inode->i_sb)->ext_tree, &fd); if (!res) { res = __hfs_ext_cache_extent(&fd, inode, block); hfs_find_exit(&fd); } return res; } The problem here that hfs_find_init() is trying to use HFS_SB(inode->i_sb)->ext_tree that is not initialized yet. It will be initailized when hfs_btree_open() finishes the execution. The patch adds checking of tree pointer in hfs_find_init() and it reworks the logic of hfs_btree_open() by reading the b-tree's header directly from the volume. The read_mapping_page() is exchanged on filemap_grab_folio() that grab the folio from mapping. Then, sb_bread() extracts the b-tree's header content and copy it into the folio. Reported-by: Wenzhi Wang <[email protected]> Signed-off-by: Viacheslav Dubeyko <[email protected]> cc: John Paul Adrian Glaubitz <[email protected]> cc: Yangtao Li <[email protected]> cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Viacheslav Dubeyko <[email protected]>

pert script tests fails with segmentation fault as below: 92: perf script tests: --- start --- test child forked, pid 103769 DB test [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.012 MB /tmp/perf-test-script.7rbftEpOzX/perf.data (9 samples) ] /usr/libexec/perf-core/tests/shell/script.sh: line 35: 103780 Segmentation fault (core dumped) perf script -i "${perfdatafile}" -s "${db_test}" --- Cleaning up --- ---- end(-1) ---- 92: perf script tests : FAILED! Backtrace pointed to : #0 0x0000000010247dd0 in maps.machine () #1 0x00000000101d178c in db_export.sample () #2 0x00000000103412c8 in python_process_event () #3 0x000000001004eb28 in process_sample_event () #4 0x000000001024fcd0 in machines.deliver_event () #5 0x000000001025005c in perf_session.deliver_event () #6 0x00000000102568b0 in __ordered_events__flush.part.0 () #7 0x0000000010251618 in perf_session.process_events () #8 0x0000000010053620 in cmd_script () #9 0x00000000100b5a28 in run_builtin () #10 0x00000000100b5f94 in handle_internal_command () #11 0x0000000010011114 in main () Further investigation reveals that this occurs in the `perf script tests`, because it uses `db_test.py` script. This script sets `perf_db_export_mode = True`. With `perf_db_export_mode` enabled, if a sample originates from a hypervisor, perf doesn't set maps for "[H]" sample in the code. Consequently, `al->maps` remains NULL when `maps__machine(al->maps)` is called from `db_export__sample`. As al->maps can be NULL in case of Hypervisor samples , use thread->maps because even for Hypervisor sample, machine should exist. If we don't have machine for some reason, return -1 to avoid segmentation fault. Reported-by: Disha Goel <[email protected]> Signed-off-by: Aditya Bodkhe <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Tested-by: Disha Goel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Suggested-by: Adrian Hunter <[email protected]> Signed-off-by: Namhyung Kim <[email protected]>

Without the change `perf `hangs up on charaster devices. On my system it's enough to run system-wide sampler for a few seconds to get the hangup: $ perf record -a -g --call-graph=dwarf $ perf report # hung `strace` shows that hangup happens on reading on a character device `/dev/dri/renderD128` $ strace -y -f -p 2780484 strace: Process 2780484 attached pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached It's call trace descends into `elfutils`: $ gdb -p 2780484 (gdb) bt #0 0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0) at ../sysdeps/unix/sysv/linux/pread64.c:25 #1 0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1 #2 0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #3 0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #4 0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #5 0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0) at util/dso.h:537 #6 0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114 #7 frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242 #8 0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #9 0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0, thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127, best_effort=best_effort@entry=false) at util/thread.h:152 #13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0, sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939 #14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2920 #15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440, sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true) at util/machine.c:2970 #16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440, sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198 #17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0, evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127 #18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127, arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255 #19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540, evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334 #20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0, file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367 #21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245 #22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324 #23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224, file_path=0x105ffbf0 "perf.data") at util/session.c:1419 #24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0, --Type <RET> for more, q to quit, c to continue without paging-- quit prog=prog@entry=0x7fff9df81220) at util/session.c:2132 #25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220) at util/session.c:2181 #26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226 #27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390 #28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076 #29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827 #30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:351 #31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404 #32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448 #33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556 The hangup happens because nothing in` perf` or `elfutils` checks if a mapped file is easily readable. The change conservatively skips all non-regular files. Signed-off-by: Sergei Trofimovich <[email protected]> Acked-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>

Symbolize stack traces by creating a live machine. Add this functionality to dump_stack and switch dump_stack users to use it. Switch TUI to use it. Add stack traces to the child test function which can be useful to diagnose blocked code. Example output: ``` $ perf test -vv PERF_RECORD_ ... 7: PERF_RECORD_* events & perf_sample fields: 7: PERF_RECORD_* events & perf_sample fields : Running (1 active) ^C Signal (2) while running tests. Terminating tests with the same signal Internal test harness failure. Completing any started tests: : 7: PERF_RECORD_* events & perf_sample fields: ---- unexpected signal (2) ---- #0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0 #1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0 #2 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64 #3 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72 #4 0x7fc12fef1393 in __nanosleep nanosleep.c:26 #5 0x7fc12ff02d68 in __sleep sleep.c:55 #6 0x55788c63196b in test__PERF_RECORD perf-record.c:0 #7 0x55788c620fb0 in run_test_child builtin-test.c:0 #8 0x55788c5bd18d in start_command run-command.c:127 #9 0x55788c621ef3 in __cmd_test builtin-test.c:0 #10 0x55788c6225bf in cmd_test ??:0 #11 0x55788c5afbd0 in run_builtin perf.c:0 #12 0x55788c5afeeb in handle_internal_command perf.c:0 #13 0x55788c52b383 in main ??:0 #14 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74 #15 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128 #16 0x55788c52b9d1 in _start ??:0 ---- unexpected signal (2) ---- #0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0 #1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0 #2 0x7fc12fea3a14 in pthread_sigmask@GLIBC_2.2.5 pthread_sigmask.c:45 #3 0x7fc12fe49fd9 in __GI___sigprocmask sigprocmask.c:26 #4 0x7fc12ff2601b in __longjmp_chk longjmp.c:36 #5 0x55788c6210c0 in print_test_result.isra.0 builtin-test.c:0 #6 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0 #7 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64 #8 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72 #9 0x7fc12fef1393 in __nanosleep nanosleep.c:26 #10 0x7fc12ff02d68 in __sleep sleep.c:55 #11 0x55788c63196b in test__PERF_RECORD perf-record.c:0 #12 0x55788c620fb0 in run_test_child builtin-test.c:0 #13 0x55788c5bd18d in start_command run-command.c:127 #14 0x55788c621ef3 in __cmd_test builtin-test.c:0 #15 0x55788c6225bf in cmd_test ??:0 #16 0x55788c5afbd0 in run_builtin perf.c:0 #17 0x55788c5afeeb in handle_internal_command perf.c:0 #18 0x55788c52b383 in main ??:0 #19 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74 #20 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128 #21 0x55788c52b9d1 in _start ??:0 7: PERF_RECORD_* events & perf_sample fields : Skip (permissions) ``` Signed-off-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>

Calling perf top with branch filters enabled on Intel CPU's with branch counters logging (A.K.A LBR event logging [1]) support results in a segfault. $ perf top -e '{cpu_core/cpu-cycles/,cpu_core/event=0xc6,umask=0x3,frontend=0x11,name=frontend_retired_dsb_miss/}' -j any,counter ... Thread 27 "perf" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffafff76c0 (LWP 949003)] perf_env__find_br_cntr_info (env=0xf66dc0 <perf_env>, nr=0x0, width=0x7fffafff62c0) at util/env.c:653 653 *width = env->cpu_pmu_caps ? env->br_cntr_width : (gdb) bt #0 perf_env__find_br_cntr_info (env=0xf66dc0 <perf_env>, nr=0x0, width=0x7fffafff62c0) at util/env.c:653 #1 0x00000000005b1599 in symbol__account_br_cntr (branch=0x7fffcc3db580, evsel=0xfea2d0, offset=12, br_cntr=8) at util/annotate.c:345 #2 0x00000000005b17fb in symbol__account_cycles (addr=5658172, start=5658160, sym=0x7fffcc0ee420, cycles=539, evsel=0xfea2d0, br_cntr=8) at util/annotate.c:389 #3 0x00000000005b1976 in addr_map_symbol__account_cycles (ams=0x7fffcd7b01d0, start=0x7fffcd7b02b0, cycles=539, evsel=0xfea2d0, br_cntr=8) at util/annotate.c:422 #4 0x000000000068d57f in hist__account_cycles (bs=0x110d288, al=0x7fffafff6540, sample=0x7fffafff6760, nonany_branch_mode=false, total_cycles=0x0, evsel=0xfea2d0) at util/hist.c:2850 #5 0x0000000000446216 in hist_iter__top_callback (iter=0x7fffafff6590, al=0x7fffafff6540, single=true, arg=0x7fffffff9e00) at builtin-top.c:737 #6 0x0000000000689787 in hist_entry_iter__add (iter=0x7fffafff6590, al=0x7fffafff6540, max_stack_depth=127, arg=0x7fffffff9e00) at util/hist.c:1359 #7 0x0000000000446710 in perf_event__process_sample (tool=0x7fffffff9e00, event=0x110d250, evsel=0xfea2d0, sample=0x7fffafff6760, machine=0x108c968) at builtin-top.c:845 #8 0x0000000000447735 in deliver_event (qe=0x7fffffffa120, qevent=0x10fc200) at builtin-top.c:1211 #9 0x000000000064ccae in do_flush (oe=0x7fffffffa120, show_progress=false) at util/ordered-events.c:245 #10 0x000000000064d005 in __ordered_events__flush (oe=0x7fffffffa120, how=OE_FLUSH__TOP, timestamp=0) at util/ordered-events.c:324 #11 0x000000000064d0ef in ordered_events__flush (oe=0x7fffffffa120, how=OE_FLUSH__TOP) at util/ordered-events.c:342 #12 0x00000000004472a9 in process_thread (arg=0x7fffffff9e00) at builtin-top.c:1120 #13 0x00007ffff6e7dba8 in start_thread (arg=<optimized out>) at pthread_create.c:448 #14 0x00007ffff6f01b8c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 The cause is that perf_env__find_br_cntr_info tries to access a null pointer pmu_caps in the perf_env struct. A similar issue exists for homogeneous core systems which use the cpu_pmu_caps structure. Fix this by populating cpu_pmu_caps and pmu_caps structures with values from sysfs when calling perf top with branch stack sampling enabled. [1], LBR event logging introduced here: https://lore.kernel.org/all/[email protected]/ Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Thomas Falcon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>

Both jbd2_log_do_checkpoint() and jbd2_journal_shrink_checkpoint_list() periodically release j_list_lock after processing a batch of buffers to avoid long hold times on the j_list_lock. However, since both functions contend for j_list_lock, the combined time spent waiting and processing can be significant. jbd2_journal_shrink_checkpoint_list() explicitly calls cond_resched() when need_resched() is true to avoid softlockups during prolonged operations. But jbd2_log_do_checkpoint() only exits its loop when need_resched() is true, relying on potentially sleeping functions like __flush_batch() or wait_on_buffer() to trigger rescheduling. If those functions do not sleep, the kernel may hit a softlockup. watchdog: BUG: soft lockup - CPU#3 stuck for 156s! [kworker/u129:2:373] CPU: 3 PID: 373 Comm: kworker/u129:2 Kdump: loaded Not tainted 6.6.0+ #10 Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.27 06/13/2017 Workqueue: writeback wb_workfn (flush-7:2) pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : native_queued_spin_lock_slowpath+0x358/0x418 lr : jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] Call trace: native_queued_spin_lock_slowpath+0x358/0x418 jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] __jbd2_log_wait_for_space+0xfc/0x2f8 [jbd2] add_transaction_credits+0x3bc/0x418 [jbd2] start_this_handle+0xf8/0x560 [jbd2] jbd2__journal_start+0x118/0x228 [jbd2] __ext4_journal_start_sb+0x110/0x188 [ext4] ext4_do_writepages+0x3dc/0x740 [ext4] ext4_writepages+0xa4/0x190 [ext4] do_writepages+0x94/0x228 __writeback_single_inode+0x48/0x318 writeback_sb_inodes+0x204/0x590 __writeback_inodes_wb+0x54/0xf8 wb_writeback+0x2cc/0x3d8 wb_do_writeback+0x2e0/0x2f8 wb_workfn+0x80/0x2a8 process_one_work+0x178/0x3e8 worker_thread+0x234/0x3b8 kthread+0xf0/0x108 ret_from_fork+0x10/0x20 So explicitly call cond_resched() in jbd2_log_do_checkpoint() to avoid softlockup. Cc: [email protected] Signed-off-by: Baokun Li <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>

…dlock When a user creates a dualpi2 qdisc it automatically sets a timer. This timer will run constantly and update the qdisc's probability field. The issue is that the timer acquires the qdisc root lock and runs in hardirq. The qdisc root lock is also acquired in dev.c whenever a packet arrives for this qdisc. Since the dualpi2 timer callback runs in hardirq, it may interrupt the packet processing running in softirq. If that happens and it runs on the same CPU, it will acquire the same lock and cause a deadlock. The following splat shows up when running a kernel compiled with lock debugging: [ +0.000224] WARNING: inconsistent lock state [ +0.000224] 6.16.0+ #10 Not tainted [ +0.000169] -------------------------------- [ +0.000029] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. [ +0.000000] ping/156 [HC0[0]:SC0[2]:HE1:SE0] takes: [ +0.000000] ffff897841242110 (&sch->root_lock_key){?.-.}-{3:3}, at: __dev_queue_xmit+0x86d/0x1140 [ +0.000000] {IN-HARDIRQ-W} state was registered at: [ +0.000000] lock_acquire.part.0+0xb6/0x220 [ +0.000000] _raw_spin_lock+0x31/0x80 [ +0.000000] dualpi2_timer+0x6f/0x270 [ +0.000000] __hrtimer_run_queues+0x1c5/0x360 [ +0.000000] hrtimer_interrupt+0x115/0x260 [ +0.000000] __sysvec_apic_timer_interrupt+0x6d/0x1a0 [ +0.000000] sysvec_apic_timer_interrupt+0x6e/0x80 [ +0.000000] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ +0.000000] pv_native_safe_halt+0xf/0x20 [ +0.000000] default_idle+0x9/0x10 [ +0.000000] default_idle_call+0x7e/0x1e0 [ +0.000000] do_idle+0x1e8/0x250 [ +0.000000] cpu_startup_entry+0x29/0x30 [ +0.000000] rest_init+0x151/0x160 [ +0.000000] start_kernel+0x6f3/0x700 [ +0.000000] x86_64_start_reservations+0x24/0x30 [ +0.000000] x86_64_start_kernel+0xc8/0xd0 [ +0.000000] common_startup_64+0x13e/0x148 [ +0.000000] irq event stamp: 6884 [ +0.000000] hardirqs last enabled at (6883): [<ffffffffa75700b3>] neigh_resolve_output+0x223/0x270 [ +0.000000] hardirqs last disabled at (6882): [<ffffffffa7570078>] neigh_resolve_output+0x1e8/0x270 [ +0.000000] softirqs last enabled at (6880): [<ffffffffa757006b>] neigh_resolve_output+0x1db/0x270 [ +0.000000] softirqs last disabled at (6884): [<ffffffffa755b533>] __dev_queue_xmit+0x73/0x1140 [ +0.000000] other info that might help us debug this: [ +0.000000] Possible unsafe locking scenario: [ +0.000000] CPU0 [ +0.000000] ---- [ +0.000000] lock(&sch->root_lock_key); [ +0.000000] <Interrupt> [ +0.000000] lock(&sch->root_lock_key); [ +0.000000] *** DEADLOCK *** [ +0.000000] 4 locks held by ping/156: [ +0.000000] #0: ffff897842332e08 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x41e/0xf40 [ +0.000000] #1: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_output+0x2c/0x190 [ +0.000000] #2: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0xad/0x950 [ +0.000000] #3: ffffffffa816f840 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x73/0x1140 I am able to reproduce it consistently when running the following: tc qdisc add dev lo handle 1: root dualpi2 ping -f 127.0.0.1 To fix it, make the timer run in softirq. Fixes: 320d031 ("sched: Struct definition and parsing of dualpi2 qdisc") Reviewed-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Victor Nogueira <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

smc_lo_register_dmb() allocates DMB buffers with kzalloc(), which are later passed to get_page() in smc_rx_splice(). Since kmalloc memory is not page-backed, this triggers WARN_ON_ONCE() in get_page() and prevents holding a refcount on the buffer. This can lead to use-after-free if the memory is released before splice_to_pipe() completes. Use folio_alloc() instead, ensuring DMBs are page-backed and safe for get_page(). WARNING: CPU: 18 PID: 12152 at ./include/linux/mm.h:1330 smc_rx_splice+0xaf8/0xe20 [smc] CPU: 18 UID: 0 PID: 12152 Comm: smcapp Kdump: loaded Not tainted 6.17.0-rc3-11705-g9cf4672ecfee #10 NONE Hardware name: IBM 3931 A01 704 (z/VM 7.4.0) Krnl PSW : 0704e00180000000 000793161032696c (smc_rx_splice+0xafc/0xe20 [smc]) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000000 001cee80007d3001 00077400000000f8 0000000000000005 0000000000000001 001cee80007d3006 0007740000001000 001c000000000000 000000009b0c99e0 0000000000001000 001c0000000000f8 001c000000000000 000003ffcc6f7c88 0007740003e98000 0007931600000005 000792969b2ff7b8 Krnl Code: 0007931610326960: af000000 mc 0,0 0007931610326964: a7f4ff43 brc 15,00079316103267ea #0007931610326968: af000000 mc 0,0 >000793161032696c: a7f4ff3f brc 15,00079316103267ea 0007931610326970: e320f1000004 lg %r2,256(%r15) 0007931610326976: c0e53fd1b5f5 brasl %r14,000793168fd5d560 000793161032697c: a7f4fbb5 brc 15,00079316103260e6 0007931610326980: b904002b lgr %r2,%r11 Call Trace: smc_rx_splice+0xafc/0xe20 [smc] smc_rx_splice+0x756/0xe20 [smc]) smc_rx_recvmsg+0xa74/0xe00 [smc] smc_splice_read+0x1ce/0x3b0 [smc] sock_splice_read+0xa2/0xf0 do_splice_read+0x198/0x240 splice_file_to_pipe+0x7e/0x110 do_splice+0x59e/0xde0 __do_splice+0x11a/0x2d0 __s390x_sys_splice+0x140/0x1f0 __do_syscall+0x122/0x280 system_call+0x6e/0x90 Last Breaking-Event-Address: smc_rx_splice+0x960/0xe20 [smc] ---[ end trace 0000000000000000 ]--- Fixes: f7a2207 ("net/smc: implement DMB-related operations of loopback-ism") Reviewed-by: Mahanta Jambigi <[email protected]> Signed-off-by: Sidraya Jayagond <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Running sha224_kunit on a KMSAN-enabled kernel results in a crash in kmsan_internal_set_shadow_origin(): BUG: unable to handle page fault for address: ffffbc3840291000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G N 6.17.0-rc3 #10 PREEMPT(voluntary) Tainted: [N]=TEST Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100 [...] Call Trace: <TASK> __msan_memset+0xee/0x1a0 sha224_final+0x9e/0x350 test_hash_buffer_overruns+0x46f/0x5f0 ? kmsan_get_shadow_origin_ptr+0x46/0xa0 ? __pfx_test_hash_buffer_overruns+0x10/0x10 kunit_try_run_case+0x198/0xa00 This occurs when memset() is called on a buffer that is not 4-byte aligned and extends to the end of a guard page, i.e. the next page is unmapped. The bug is that the loop at the end of kmsan_internal_set_shadow_origin() accesses the wrong shadow memory bytes when the address is not 4-byte aligned. Since each 4 bytes are associated with an origin, it rounds the address and size so that it can access all the origins that contain the buffer. However, when it checks the corresponding shadow bytes for a particular origin, it incorrectly uses the original unrounded shadow address. This results in reads from shadow memory beyond the end of the buffer's shadow memory, which crashes when that memory is not mapped. To fix this, correctly align the shadow address before accessing the 4 shadow bytes corresponding to each origin. Link: https://lkml.kernel.org/r/[email protected] Fixes: 2ef3cec ("kmsan: do not wipe out origin when doing partial unpoisoning") Signed-off-by: Eric Biggers <[email protected]> Tested-by: Alexander Potapenko <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

The lynx_28g_pll_get function may return NULL when called with an unsupported submode argument. This function is only called from the lynx_28g_lane_set_{10gbaser,sgmii} functions, and lynx_28g_set_mode checks available modes before setting a protocol. NXP vendor kernel based on v6.6.52 however is missing any checks and connecting a 2.5/5gbase-t ethernet phy can cause null pointer dereference [1]. Check return value at every invocation and abort in the unlikely error case. Further print a warning message the first time lynx_28g_pll_get returns null, to catch this case should it occur after future changes. [1] [ 127.019924] fsl_dpaa2_eth dpni.4 eth5: dpmac_set_protocol(2500base-x) = -ENOTSUPP [ 127.027451] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000014 [ 127.036245] Mem abort info: [ 127.039044] ESR = 0x0000000096000004 [ 127.042794] EC = 0x25: DABT (current EL), IL = 32 bits [ 127.048107] SET = 0, FnV = 0 [ 127.051161] EA = 0, S1PTW = 0 [ 127.054301] FSC = 0x04: level 0 translation fault [ 127.059179] Data abort info: [ 127.062059] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 127.067547] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 127.072596] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 127.077907] user pgtable: 4k pages, 48-bit VAs, pgdp=00000020816c9000 [ 127.084344] [0000000000000014] pgd=0000000000000000, p4d=0000000000000000 [ 127.091133] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 127.097390] Modules linked in: cfg80211 rfkill fsl_jr_uio caam_jr dpaa2_caam caamkeyblob_desc crypto_engine caamhash_desc onboard_usb_hub caamalg_desc crct10dif_ce libdes caam error at24 rtc_ds1307 rtc_fsl_ftm_alarm nvmem_layerscape_sfp layerscape_edac_mod dm_mod nfnetlink ip_tables [ 127.122436] CPU: 5 PID: 96 Comm: kworker/u35:0 Not tainted 6.6.52-g3578ef896722 #10 [ 127.130083] Hardware name: SolidRun LX2162A Clearfog (DT) [ 127.135470] Workqueue: events_power_efficient phylink_resolve [ 127.141219] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 127.148170] pc : lynx_28g_set_lane_mode+0x300/0x818 [ 127.153041] lr : lynx_28g_set_lane_mode+0x2fc/0x818 [ 127.157909] sp : ffff8000806f3b80 [ 127.161212] x29: ffff8000806f3b80 x28: 0000000000000000 x27: 0000000000000000 [ 127.168340] x26: ffff29d6c11f3098 x25: 0000000000000000 x24: 0000000000000000 [ 127.175467] x23: ffff29d6c11f31f0 x22: ffff29d6c11f3080 x21: 0000000000000001 [ 127.182595] x20: ffff29d6c11f4c00 x19: 0000000000000000 x18: 0000000000000006 [ 127.189722] x17: 4f4e452d203d2029 x16: 782d657361623030 x15: 3532286c6f636f74 [ 127.196849] x14: 6f72705f7465735f x13: ffffd7a8ff991cc0 x12: 0000000000000acb [ 127.203976] x11: 0000000000000399 x10: ffffd7a8ff9e9cc0 x9 : 0000000000000000 [ 127.211104] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff29d6c11f3080 [ 127.218231] x5 : 0000000000000000 x4 : 0000000040800030 x3 : 000000000000034c [ 127.225358] x2 : ffff29d6c11f3080 x1 : 000000000000034c x0 : 0000000000000000 [ 127.232486] Call trace: [ 127.234921] lynx_28g_set_lane_mode+0x300/0x818 [ 127.239443] lynx_28g_set_mode+0x12c/0x148 [ 127.243529] phy_set_mode_ext+0x5c/0xa8 [ 127.247356] lynx_pcs_config+0x64/0x294 [ 127.251184] phylink_major_config+0x184/0x49c [ 127.255532] phylink_resolve+0x2a0/0x5d8 [ 127.259446] process_one_work+0x138/0x248 [ 127.263448] worker_thread+0x320/0x438 [ 127.267187] kthread+0x114/0x118 [ 127.270406] ret_from_fork+0x10/0x20 [ 127.273973] Code: 2a1303e1 aa0603e0 97fffd3b aa0003e5 (b9401400) [ 127.280055] ---[ end trace 0000000000000000 ]--- Signed-off-by: Josua Mayer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>

Before disabling SR-IOV via config space accesses to the parent PF, sriov_disable() first removes the PCI devices representing the VFs. Since commit 9d16947 ("PCI: Add global pci_lock_rescan_remove()") such removal operations are serialized against concurrent remove and rescan using the pci_rescan_remove_lock. No such locking was ever added in sriov_disable() however. In particular when commit 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device removal into sriov_del_vfs() there was still no locking around the pci_iov_remove_virtfn() calls. On s390 the lack of serialization in sriov_disable() may cause double remove and list corruption with the below (amended) trace being observed: PSW: 0704c00180000000 0000000c914e4b38 (klist_put+56) GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001 00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480 0000000000000001 0000000000000000 0000000000000000 0000000180692828 00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8 #0 [3800313fb20] device_del at c9158ad5c #1 [3800313fb88] pci_remove_bus_device at c915105ba #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198 #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0 #4 [3800313fc60] zpci_bus_remove_device at c90fb6104 #5 [3800313fca0] __zpci_event_availability at c90fb3dca #6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2 #7 [3800313fd60] crw_collect_info at c91905822 #8 [3800313fe10] kthread at c90feb390 #9 [3800313fe68] __ret_from_fork at c90f6aa64 #10 [3800313fe98] ret_from_fork at c9194f3f2. This is because in addition to sriov_disable() removing the VFs, the platform also generates hot-unplug events for the VFs. This being the reverse operation to the hotplug events generated by sriov_enable() and handled via pdev->no_vf_scan. And while the event processing takes pci_rescan_remove_lock and checks whether the struct pci_dev still exists, the lack of synchronization makes this checking racy. Other races may also be possible of course though given that this lack of locking persisted so long observable races seem very rare. Even on s390 the list corruption was only observed with certain devices since the platform events are only triggered by config accesses after the removal, so as long as the removal finished synchronously they would not race. Either way the locking is missing so fix this by adding it to the sriov_del_vfs() helper. Just like PCI rescan-remove, locking is also missing in sriov_add_vfs() including for the error case where pci_stop_and_remove_bus_device() is called without the PCI rescan-remove lock being held. Even in the non-error case, adding new PCI devices and buses should be serialized via the PCI rescan-remove lock. Add the necessary locking. Fixes: 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()") Signed-off-by: Niklas Schnelle <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Benjamin Block <[email protected]> Reviewed-by: Farhan Ali <[email protected]> Reviewed-by: Julian Ruess <[email protected]> Cc: [email protected] Link: https://patch.msgid.link/[email protected]

The test starts a workload and then opens events. If the events fail to open, for example because of perf_event_paranoid, the gopipe of the workload is leaked and the file descriptor leak check fails when the test exits. To avoid this cancel the workload when opening the events fails. Before: ``` $ perf test -vv 7 7: PERF_RECORD_* events & perf_sample fields: --- start --- test child forked, pid 1189568 Using CPUID GenuineIntel-6-B7-1 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/) disabled 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 sys_perf_event_open failed, error -13 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/) disabled 1 exclude_kernel 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/) disabled 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 sys_perf_event_open failed, error -13 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/) disabled 1 exclude_kernel 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3 Attempt to add: software/cpu-clock/ ..after resolving event: software/config=0/ cpu-clock -> software/cpu-clock/ ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0x9 (PERF_COUNT_SW_DUMMY) sample_type IP|TID|TIME|CPU read_format ID|LOST disabled 1 inherit 1 mmap 1 comm 1 enable_on_exec 1 task 1 sample_id_all 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 { wakeup_events, wakeup_watermark } 1 ------------------------------------------------------------ sys_perf_event_open: pid 1189569 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -13 perf_evlist__open: Permission denied ---- end(-2) ---- Leak of file descriptor 6 that opened: 'pipe:[14200347]' ---- unexpected signal (6) ---- iFailed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon Failed to read build ID for //anon #0 0x565358f6666e in child_test_sig_handler builtin-test.c:311 #1 0x7f29ce849df0 in __restore_rt libc_sigaction.c:0 #2 0x7f29ce89e95c in __pthread_kill_implementation pthread_kill.c:44 #3 0x7f29ce849cc2 in raise raise.c:27 #4 0x7f29ce8324ac in abort abort.c:81 #5 0x565358f662d4 in check_leaks builtin-test.c:226 #6 0x565358f6682e in run_test_child builtin-test.c:344 #7 0x565358ef7121 in start_command run-command.c:128 #8 0x565358f67273 in start_test builtin-test.c:545 #9 0x565358f6771d in __cmd_test builtin-test.c:647 #10 0x565358f682bd in cmd_test builtin-test.c:849 #11 0x565358ee5ded in run_builtin perf.c:349 #12 0x565358ee6085 in handle_internal_command perf.c:401 #13 0x565358ee61de in run_argv perf.c:448 #14 0x565358ee6527 in main perf.c:555 #15 0x7f29ce833ca8 in __libc_start_call_main libc_start_call_main.h:74 #16 0x7f29ce833d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128 #17 0x565358e391c1 in _start perf[851c1] 7: PERF_RECORD_* events & perf_sample fields : FAILED! ``` After: ``` $ perf test 7 7: PERF_RECORD_* events & perf_sample fields : Skip (permissions) ``` Fixes: 16d00fe ("perf tests: Move test__PERF_RECORD into separate object") Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Chun-Tse Shao <[email protected]> Cc: Howard Chu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Initial rss_hdr allocation uses virtio_device->device, but virtnet_set_queues() frees using net_device->device. This device mismatch causing below devres warning [ 3788.514041] ------------[ cut here ]------------ [ 3788.514044] WARNING: drivers/base/devres.c:1095 at devm_kfree+0x84/0x98, CPU#16: vdpa/1463 [ 3788.514054] Modules linked in: octep_vdpa virtio_net virtio_vdpa [last unloaded: virtio_vdpa] [ 3788.514064] CPU: 16 UID: 0 PID: 1463 Comm: vdpa Tainted: G W 6.18.0 #10 PREEMPT [ 3788.514067] Tainted: [W]=WARN [ 3788.514069] Hardware name: Marvell CN106XX board (DT) [ 3788.514071] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 3788.514074] pc : devm_kfree+0x84/0x98 [ 3788.514076] lr : devm_kfree+0x54/0x98 [ 3788.514079] sp : ffff800084e2f220 [ 3788.514080] x29: ffff800084e2f220 x28: ffff0003b2366000 x27: 000000000000003f [ 3788.514085] x26: 000000000000003f x25: ffff000106f17c10 x24: 0000000000000080 [ 3788.514089] x23: ffff00045bb8ab08 x22: ffff00045bb8a000 x21: 0000000000000018 [ 3788.514093] x20: ffff0004355c3080 x19: ffff00045bb8aa00 x18: 0000000000080000 [ 3788.514098] x17: 0000000000000040 x16: 000000000000001f x15: 000000000007ffff [ 3788.514102] x14: 0000000000000488 x13: 0000000000000005 x12: 00000000000fffff [ 3788.514106] x11: ffffffffffffffff x10: 0000000000000005 x9 : ffff800080c8c05c [ 3788.514110] x8 : ffff800084e2eeb8 x7 : 0000000000000000 x6 : 000000000000003f [ 3788.514115] x5 : ffff8000831bafe0 x4 : ffff800080c8b010 x3 : ffff0004355c3080 [ 3788.514119] x2 : ffff0004355c3080 x1 : 0000000000000000 x0 : 0000000000000000 [ 3788.514123] Call trace: [ 3788.514125] devm_kfree+0x84/0x98 (P) [ 3788.514129] virtnet_set_queues+0x134/0x2e8 [virtio_net] [ 3788.514135] virtnet_probe+0x9c0/0xe00 [virtio_net] [ 3788.514139] virtio_dev_probe+0x1e0/0x338 [ 3788.514144] really_probe+0xc8/0x3a0 [ 3788.514149] __driver_probe_device+0x84/0x170 [ 3788.514152] driver_probe_device+0x44/0x120 [ 3788.514155] __device_attach_driver+0xc4/0x168 [ 3788.514158] bus_for_each_drv+0x8c/0xf0 [ 3788.514161] __device_attach+0xa4/0x1c0 [ 3788.514164] device_initial_probe+0x1c/0x30 [ 3788.514168] bus_probe_device+0xb4/0xc0 [ 3788.514170] device_add+0x614/0x828 [ 3788.514173] register_virtio_device+0x214/0x258 [ 3788.514175] virtio_vdpa_probe+0xa0/0x110 [virtio_vdpa] [ 3788.514179] vdpa_dev_probe+0xa8/0xd8 [ 3788.514183] really_probe+0xc8/0x3a0 [ 3788.514186] __driver_probe_device+0x84/0x170 [ 3788.514189] driver_probe_device+0x44/0x120 [ 3788.514192] __device_attach_driver+0xc4/0x168 [ 3788.514195] bus_for_each_drv+0x8c/0xf0 [ 3788.514197] __device_attach+0xa4/0x1c0 [ 3788.514200] device_initial_probe+0x1c/0x30 [ 3788.514203] bus_probe_device+0xb4/0xc0 [ 3788.514206] device_add+0x614/0x828 [ 3788.514209] _vdpa_register_device+0x58/0x88 [ 3788.514211] octep_vdpa_dev_add+0x104/0x228 [octep_vdpa] [ 3788.514215] vdpa_nl_cmd_dev_add_set_doit+0x2d0/0x3c0 [ 3788.514218] genl_family_rcv_msg_doit+0xe4/0x158 [ 3788.514222] genl_rcv_msg+0x218/0x298 [ 3788.514225] netlink_rcv_skb+0x64/0x138 [ 3788.514229] genl_rcv+0x40/0x60 [ 3788.514233] netlink_unicast+0x32c/0x3b0 [ 3788.514237] netlink_sendmsg+0x170/0x3b8 [ 3788.514241] __sys_sendto+0x12c/0x1c0 [ 3788.514246] __arm64_sys_sendto+0x30/0x48 [ 3788.514249] invoke_syscall.constprop.0+0x58/0xf8 [ 3788.514255] do_el0_svc+0x48/0xd0 [ 3788.514259] el0_svc+0x48/0x210 [ 3788.514264] el0t_64_sync_handler+0xa0/0xe8 [ 3788.514268] el0t_64_sync+0x198/0x1a0 [ 3788.514271] ---[ end trace 0000000000000000 ]--- Fix by using virtio_device->device consistently for allocation and deallocation Fixes: 4944be2 ("virtio_net: Allocate rss_hdr with devres") Signed-off-by: Kommula Shiva Shankar <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]> Reviewed-by: Xuan Zhuo <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

The ETM decoder incorrectly assumed that auxtrace queue indices were equivalent to CPU number. This assumption is used for inserting records into the queue, and for fetching queues when given a CPU number. This assumption held when Perf always opened a dummy event on every CPU, even if the user provided a subset of CPUs on the commandline, resulting in the indices aligning. For example: # event : name = cs_etm//u, , id = { 2451, 2452 }, type = 11 (cs_etm), size = 136, config = 0x4010, { sample_period, samp> # event : name = dummy:u, , id = { 2453, 2454, 2455, 2456 }, type = 1 (PERF_TYPE_SOFTWARE), size = 136, config = 0x9 (PER> 0 0 0x200 [0xd0]: PERF_RECORD_ID_INDEX nr: 6 ... id: 2451 idx: 2 cpu: 2 tid: -1 ... id: 2452 idx: 3 cpu: 3 tid: -1 ... id: 2453 idx: 0 cpu: 0 tid: -1 ... id: 2454 idx: 1 cpu: 1 tid: -1 ... id: 2455 idx: 2 cpu: 2 tid: -1 ... id: 2456 idx: 3 cpu: 3 tid: -1 Since commit 811082e ("perf parse-events: Support user CPUs mixed with threads/processes") the dummy event no longer behaves in this way, making the ETM event indices start from 0 on the first CPU recorded regardless of its ID: # event : name = cs_etm//u, , id = { 771, 772 }, type = 11 (cs_etm), size = 144, config = 0x4010, { sample_period, sample> # event : name = dummy:u, , id = { 773, 774 }, type = 1 (PERF_TYPE_SOFTWARE), size = 144, config = 0x9 (PERF_COUNT_SW_DUM> 0 0 0x200 [0x90]: PERF_RECORD_ID_INDEX nr: 4 ... id: 771 idx: 0 cpu: 2 tid: -1 ... id: 772 idx: 1 cpu: 3 tid: -1 ... id: 773 idx: 0 cpu: 2 tid: -1 ... id: 774 idx: 1 cpu: 3 tid: -1 This causes the following segfault when decoding: $ perf record -e cs_etm//u -C 2,3 -- true $ perf report perf: Segmentation fault -------- backtrace -------- #0 0xaaaabf9fd020 in ui__signal_backtrace setup.c:110 #1 0xffffab5c7930 in __kernel_rt_sigreturn [vdso][930] #2 0xaaaabfb68d30 in cs_etm_decoder__reset cs-etm-decoder.c:85 #3 0xaaaabfb65930 in cs_etm__get_data_block cs-etm.c:2032 #4 0xaaaabfb666fc in cs_etm__run_per_cpu_timeless_decoder cs-etm.c:2551 #5 0xaaaabfb6692c in (cs_etm__process_timeless_queues cs-etm.c:2612 #6 0xaaaabfb63390 in cs_etm__flush_events cs-etm.c:921 #7 0xaaaabfb324c0 in auxtrace__flush_events auxtrace.c:2915 #8 0xaaaabfaac378 in __perf_session__process_events session.c:2285 #9 0xaaaabfaacc9c in perf_session__process_events session.c:2442 #10 0xaaaabf8d3d90 in __cmd_report builtin-report.c:1085 #11 0xaaaabf8d6944 in cmd_report builtin-report.c:1866 #12 0xaaaabf95ebfc in run_builtin perf.c:351 #13 0xaaaabf95eeb0 in handle_internal_command perf.c:404 #14 0xaaaabf95f068 in run_argv perf.c:451 #15 0xaaaabf95f390 in main perf.c:558 #16 0xffffaab97400 in __libc_start_call_main libc_start_call_main.h:74 #17 0xffffaab974d8 in __libc_start_main@@GLIBC_2.34 libc-start.c:128 #18 0xaaaabf8aa8f0 in _start perf[7a8f0] Fix it by inserting into the queues based on CPU number, rather than using the index. Fixes: 811082e ("perf parse-events: Support user CPUs mixed with threads/processes") Signed-off-by: James Clark <[email protected]> Tested-by: Leo Yan <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: [email protected] Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Thomas Falcon <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

When run on a kernel without BTF info, perf crashes: libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find valid kernel BTF Program received signal SIGSEGV, Segmentation fault. 0x00005555556915b7 in btf.type_cnt () (gdb) bt #0 0x00005555556915b7 in btf.type_cnt () #1 0x0000555555691fbc in btf_find_by_name_kind () #2 0x00005555556920d0 in btf.find_by_name_kind () #3 0x00005555558a1b7c in init_numa_data (con=0x7fffffffd0a0) at util/bpf_lock_contention.c:125 #4 0x00005555558a264b in lock_contention_prepare (con=0x7fffffffd0a0) at util/bpf_lock_contention.c:313 #5 0x0000555555620702 in __cmd_contention (argc=0, argv=0x7fffffffea10) at builtin-lock.c:2084 #6 0x0000555555622c8d in cmd_lock (argc=0, argv=0x7fffffffea10) at builtin-lock.c:2755 #7 0x0000555555651451 in run_builtin (p=0x555556104f00 <commands+576>, argc=3, argv=0x7fffffffea10) at perf.c:349 #8 0x00005555556516ed in handle_internal_command (argc=3, argv=0x7fffffffea10) at perf.c:401 #9 0x000055555565184e in run_argv (argcp=0x7fffffffe7fc, argv=0x7fffffffe7f0) at perf.c:445 #10 0x0000555555651b9f in main (argc=3, argv=0x7fffffffea10) at perf.c:553 Check if btf loading failed, and don't do anything with it in init_numa_data(). This leads to the following error message, instead of just a crash: libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find valid kernel BTF libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find valid kernel BTF libbpf: Error loading vmlinux BTF: -ESRCH libbpf: failed to load BPF skeleton 'lock_contention_bpf': -ESRCH Failed to load lock-contention BPF skeleton lock contention BPF setup failed Signed-off-by: Tycho Andersen (AMD) <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Chun-Tse Shao <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

…ools() The numa_node can be < 0 since NUMA_NO_NODE = -1. However, struct blk_mq_hw_ctx{} defines numa_node as unsigned int. As a result, numa_node is set to UINT_MAX for NUMA_NO_NODE in blk_mq_alloc_hctx(). Later, nvme_setup_descriptor_pools() accesses descriptor_pools[numa_node]. Due to the above, it tries to access descriptor_pools[UINT_MAX]. The address is garbage but accessible because it is canonical and still within the slab memory range. Therefore, no page fault occurs, and KASAN cannot detect this since it is beyond the redzones. Subsequently, normal I/O calls dma_pool_alloc() with the garbage pool address. pool->next_block contains a wild pointer, causing a general protection fault (GPF). To fix this, this patch changes the type of numa_node to int and adds a check for NUMA_NO_NODE. Log: Oops: general protection fault, probably for non-canonical address 0xe9803b040854d02c: 0000 [#1] SMP KASAN PTI KASAN: maybe wild-memory-access in range [0x4c01f82042a68160-0x4c01f82042a68167][FEMU] Err: I/O cmd failed: opcode=0x2 status=0x4002 CPU: 0 UID: 0 PID: 112363 Comm: systemd-udevd Not tainted 6.19.0-dirty #10 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:pool_block_pop mm/dmapool.c:187 [inline] RIP: 0010:dma_pool_alloc+0x110/0x990 mm/dmapool.c:417 Code: 00 0f 85 a4 07 00 00 4c 8b 63 58 4d 85 e4 0f 84 12 01 00 00 e8 41 1d 93 ff 4c 89 e2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 a7 07 00 00 49 8b 04 24 48 8d 7b 68 48 89 fa 48 RSP: 0018:ffffc90002b9efd0 EFLAGS: 00010003 RAX: dffffc0000000000 RBX: ffff888005466800 RCX: ffffffff94faab7f RDX: 09803f040854d02c RSI: 6c9b26c9b26c9b27 RDI: ffff88800c725ea0 RBP: ffffc90002b9f060 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000003 R11: 0000000000000000 R12: 4c01f82042a68164 R13: ffff888005466800 R14: 0000000000000820 R15: ffff888007b29000 FS: 00007f2abc4ff8c0(0000) GS:ffff8880d1ff7000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056360eb89000 CR3: 000000000a480000 CR4: 00000000000006f0 Call Trace: <TASK> nvme_pci_setup_data_prp drivers/nvme/host/pci.c:906 [inline] nvme_map_data drivers/nvme/host/pci.c:1114 [inline] nvme_prep_rq.part.0+0x17d3/0x3c90 drivers/nvme/host/pci.c:1243 nvme_prep_rq drivers/nvme/host/pci.c:1239 [inline] nvme_prep_rq_batch drivers/nvme/host/pci.c:1321 [inline] nvme_queue_rqs+0x37b/0x8a0 drivers/nvme/host/pci.c:1336 __blk_mq_flush_list block/blk-mq.c:2848 [inline] __blk_mq_flush_list+0xaa/0xe0 block/blk-mq.c:2844 blk_mq_dispatch_queue_requests+0x4f5/0x990 block/blk-mq.c:2893 blk_mq_flush_plug_list+0x232/0x650 block/blk-mq.c:2981 __blk_flush_plug+0x2c3/0x510 block/blk-core.c:1225 blk_finish_plug block/blk-core.c:1252 [inline] blk_finish_plug+0x64/0xc0 block/blk-core.c:1249 read_pages+0x6bd/0x9d0 mm/readahead.c:176 page_cache_ra_unbounded+0x659/0x950 mm/readahead.c:269 do_page_cache_ra mm/readahead.c:332 [inline] force_page_cache_ra+0x282/0x3a0 mm/readahead.c:361 page_cache_sync_ra+0x201/0xbf0 mm/readahead.c:579 filemap_get_pages+0x3be/0x1990 mm/filemap.c:2690 filemap_read+0x3ea/0xdf0 mm/filemap.c:2800 blkdev_read_iter+0x1b8/0x520 block/fops.c:856 new_sync_read fs/read_write.c:491 [inline] vfs_read+0x90f/0xd80 fs/read_write.c:572 ksys_read+0x14e/0x280 fs/read_write.c:715 __do_sys_read fs/read_write.c:724 [inline] __se_sys_read fs/read_write.c:722 [inline] __x64_sys_read+0x7b/0xc0 fs/read_write.c:722 x64_sys_call+0x17ec/0x21b0 arch/x86/include/generated/asm/syscalls_64.h:1 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x8b/0x1200 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f2abc7b204e Code: 0f 1f 40 00 48 8b 15 79 af 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28 RSP: 002b:00007fff07113cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 000056360eb6a528 RCX: 00007f2abc7b204e RDX: 0000000000040000 RSI: 000056360eb6a538 RDI: 000000000000000f RBP: 000056360e8d23d0 R08: 000056360eb6a510 R09: 00007f2abc79abe0 R10: 0000000000040050 R11: 0000000000000246 R12: 000000003ff80000 R13: 0000000000040000 R14: 000056360eb6a510 R15: 000056360e8d2420 </TASK> Modules linked in: Fixes: 320ae51 ("blk-mq: new multi-queue block IO queueing mechanism") Fixes: d977506 ("nvme-pci: make PRP list DMA pools per-NUMA-node") Acked-by: Chao Shi <[email protected]> Acked-by: Weidong Zhu <[email protected]> Acked-by: Dave Tian <[email protected]> Signed-off-by: Sungwoo Kim <[email protected]>

dev->online_queues is a count incremented in nvme_init_queue. Thus, valid indices are 0 through dev->online_queues − 1. This patch fixes the loop condition to ensure the index stays within the valid range. Index 0 is excluded because it is the admin queue. KASAN splat: ================================================================== BUG: KASAN: slab-out-of-bounds in nvme_dbbuf_free drivers/nvme/host/pci.c:377 [inline] BUG: KASAN: slab-out-of-bounds in nvme_dbbuf_set+0x39c/0x400 drivers/nvme/host/pci.c:404 Read of size 2 at addr ffff88800592a574 by task kworker/u8:5/74 CPU: 0 UID: 0 PID: 74 Comm: kworker/u8:5 Not tainted 6.19.0-dirty #10 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Workqueue: nvme-reset-wq nvme_reset_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0xea/0x150 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xce/0x5d0 mm/kasan/report.c:482 kasan_report+0xdc/0x110 mm/kasan/report.c:595 __asan_report_load2_noabort+0x18/0x20 mm/kasan/report_generic.c:379 nvme_dbbuf_free drivers/nvme/host/pci.c:377 [inline] nvme_dbbuf_set+0x39c/0x400 drivers/nvme/host/pci.c:404 nvme_reset_work+0x36b/0x8c0 drivers/nvme/host/pci.c:3252 process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257 process_scheduled_works kernel/workqueue.c:3340 [inline] worker_thread+0x65c/0xe60 kernel/workqueue.c:3421 kthread+0x41a/0x930 kernel/kthread.c:463 ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 </TASK> Allocated by task 34 on cpu 1 at 4.241550s: kasan_save_stack+0x2c/0x60 mm/kasan/common.c:57 kasan_save_track+0x1c/0x70 mm/kasan/common.c:78 kasan_save_alloc_info+0x3c/0x50 mm/kasan/generic.c:570 poison_kmalloc_redzone mm/kasan/common.c:398 [inline] __kasan_kmalloc+0xb5/0xc0 mm/kasan/common.c:415 kasan_kmalloc include/linux/kasan.h:263 [inline] __do_kmalloc_node mm/slub.c:5657 [inline] __kmalloc_node_noprof+0x2bf/0x8d0 mm/slub.c:5663 kmalloc_array_node_noprof include/linux/slab.h:1075 [inline] nvme_pci_alloc_dev drivers/nvme/host/pci.c:3479 [inline] nvme_probe+0x2f1/0x1820 drivers/nvme/host/pci.c:3534 local_pci_probe+0xef/0x1c0 drivers/pci/pci-driver.c:324 pci_call_probe drivers/pci/pci-driver.c:392 [inline] __pci_device_probe drivers/pci/pci-driver.c:417 [inline] pci_device_probe+0x743/0x920 drivers/pci/pci-driver.c:451 call_driver_probe drivers/base/dd.c:583 [inline] really_probe+0x29b/0xb70 drivers/base/dd.c:661 __driver_probe_device+0x3b0/0x4a0 drivers/base/dd.c:803 driver_probe_device+0x56/0x1f0 drivers/base/dd.c:833 __driver_attach_async_helper+0x155/0x340 drivers/base/dd.c:1159 async_run_entry_fn+0xa6/0x4b0 kernel/async.c:129 process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257 process_scheduled_works kernel/workqueue.c:3340 [inline] worker_thread+0x65c/0xe60 kernel/workqueue.c:3421 kthread+0x41a/0x930 kernel/kthread.c:463 ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 The buggy address belongs to the object at ffff88800592a000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 244 bytes to the right of allocated 1152-byte region [ffff88800592a000, ffff88800592a480) The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5928 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 anon flags: 0xfffffc0000040(head|node=0|zone=1|lastcpupid=0x1fffff) page_type: f5(slab) raw: 000fffffc0000040 ffff888001042000 0000000000000000 dead000000000001 raw: 0000000000000000 0000000000080008 00000000f5000000 0000000000000000 head: 000fffffc0000040 ffff888001042000 0000000000000000 dead000000000001 head: 0000000000000000 0000000000080008 00000000f5000000 0000000000000000 head: 000fffffc0000003 ffffea0000164a01 00000000ffffffff 00000000ffffffff head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88800592a400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88800592a480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88800592a500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff88800592a580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88800592a600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Fixes: 0f0d2c8 (nvme: free sq/cq dbbuf pointers when dbbuf set fails) Acked-by: Chao Shi <[email protected]> Acked-by: Weidong Zhu <[email protected]> Acked-by: Dave Tian <[email protected]> Signed-off-by: Sungwoo Kim <[email protected]> Signed-off-by: Keith Busch <[email protected]>

…ools() The numa_node can be < 0 since NUMA_NO_NODE = -1. However, struct blk_mq_hw_ctx{} defines numa_node as unsigned int. As a result, numa_node is set to UINT_MAX for NUMA_NO_NODE in blk_mq_alloc_hctx(). Later, nvme_setup_descriptor_pools() accesses descriptor_pools[numa_node]. Due to the above, it tries to access descriptor_pools[UINT_MAX]. The address is garbage but accessible because it is canonical and still within the slab memory range. Therefore, no page fault occurs, and KASAN cannot detect this since it is beyond the redzones. Subsequently, normal I/O calls dma_pool_alloc() with the garbage pool address. pool->next_block contains a wild pointer, causing a general protection fault (GPF). To fix this, this patch changes the type of numa_node to int and adds a check for NUMA_NO_NODE. Log: Oops: general protection fault, probably for non-canonical address 0xe9803b040854d02c: 0000 [#1] SMP KASAN PTI KASAN: maybe wild-memory-access in range [0x4c01f82042a68160-0x4c01f82042a68167][FEMU] Err: I/O cmd failed: opcode=0x2 status=0x4002 CPU: 0 UID: 0 PID: 112363 Comm: systemd-udevd Not tainted 6.19.0-dirty #10 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:pool_block_pop mm/dmapool.c:187 [inline] RIP: 0010:dma_pool_alloc+0x110/0x990 mm/dmapool.c:417 Code: 00 0f 85 a4 07 00 00 4c 8b 63 58 4d 85 e4 0f 84 12 01 00 00 e8 41 1d 93 ff 4c 89 e2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 a7 07 00 00 49 8b 04 24 48 8d 7b 68 48 89 fa 48 RSP: 0018:ffffc90002b9efd0 EFLAGS: 00010003 RAX: dffffc0000000000 RBX: ffff888005466800 RCX: ffffffff94faab7f RDX: 09803f040854d02c RSI: 6c9b26c9b26c9b27 RDI: ffff88800c725ea0 RBP: ffffc90002b9f060 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000003 R11: 0000000000000000 R12: 4c01f82042a68164 R13: ffff888005466800 R14: 0000000000000820 R15: ffff888007b29000 FS: 00007f2abc4ff8c0(0000) GS:ffff8880d1ff7000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056360eb89000 CR3: 000000000a480000 CR4: 00000000000006f0 Call Trace: <TASK> nvme_pci_setup_data_prp drivers/nvme/host/pci.c:906 [inline] nvme_map_data drivers/nvme/host/pci.c:1114 [inline] nvme_prep_rq.part.0+0x17d3/0x3c90 drivers/nvme/host/pci.c:1243 nvme_prep_rq drivers/nvme/host/pci.c:1239 [inline] nvme_prep_rq_batch drivers/nvme/host/pci.c:1321 [inline] nvme_queue_rqs+0x37b/0x8a0 drivers/nvme/host/pci.c:1336 __blk_mq_flush_list block/blk-mq.c:2848 [inline] __blk_mq_flush_list+0xaa/0xe0 block/blk-mq.c:2844 blk_mq_dispatch_queue_requests+0x4f5/0x990 block/blk-mq.c:2893 blk_mq_flush_plug_list+0x232/0x650 block/blk-mq.c:2981 __blk_flush_plug+0x2c3/0x510 block/blk-core.c:1225 blk_finish_plug block/blk-core.c:1252 [inline] blk_finish_plug+0x64/0xc0 block/blk-core.c:1249 read_pages+0x6bd/0x9d0 mm/readahead.c:176 page_cache_ra_unbounded+0x659/0x950 mm/readahead.c:269 do_page_cache_ra mm/readahead.c:332 [inline] force_page_cache_ra+0x282/0x3a0 mm/readahead.c:361 page_cache_sync_ra+0x201/0xbf0 mm/readahead.c:579 filemap_get_pages+0x3be/0x1990 mm/filemap.c:2690 filemap_read+0x3ea/0xdf0 mm/filemap.c:2800 blkdev_read_iter+0x1b8/0x520 block/fops.c:856 new_sync_read fs/read_write.c:491 [inline] vfs_read+0x90f/0xd80 fs/read_write.c:572 ksys_read+0x14e/0x280 fs/read_write.c:715 __do_sys_read fs/read_write.c:724 [inline] __se_sys_read fs/read_write.c:722 [inline] __x64_sys_read+0x7b/0xc0 fs/read_write.c:722 x64_sys_call+0x17ec/0x21b0 arch/x86/include/generated/asm/syscalls_64.h:1 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x8b/0x1200 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f2abc7b204e Code: 0f 1f 40 00 48 8b 15 79 af 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28 RSP: 002b:00007fff07113cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 000056360eb6a528 RCX: 00007f2abc7b204e RDX: 0000000000040000 RSI: 000056360eb6a538 RDI: 000000000000000f RBP: 000056360e8d23d0 R08: 000056360eb6a510 R09: 00007f2abc79abe0 R10: 0000000000040050 R11: 0000000000000246 R12: 000000003ff80000 R13: 0000000000040000 R14: 000056360eb6a510 R15: 000056360e8d2420 </TASK> Modules linked in: Fixes: 320ae51 ("blk-mq: new multi-queue block IO queueing mechanism") Fixes: d977506 ("nvme-pci: make PRP list DMA pools per-NUMA-node") Acked-by: Chao Shi <[email protected]> Acked-by: Weidong Zhu <[email protected]> Acked-by: Dave Tian <[email protected]> Signed-off-by: Sungwoo Kim <[email protected]>

During ADSP stop and start, the kernel crashes due to the order in which ASoC components are removed. On ADSP stop, the q6apm-audio .remove callback unloads topology and removes PCM runtimes during ASoC teardown. This deletes the RTDs that contain the q6apm DAI components before their removal pass runs, leaving those components still linked to the card and causing crashes on the next rebind. Fix this by ensuring that all dependent (child) components are removed first, and the q6apm component is removed last. [ 48.105720] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000d0 [ 48.114763] Mem abort info: [ 48.117650] ESR = 0x0000000096000004 [ 48.121526] EC = 0x25: DABT (current EL), IL = 32 bits [ 48.127010] SET = 0, FnV = 0 [ 48.130172] EA = 0, S1PTW = 0 [ 48.133415] FSC = 0x04: level 0 translation fault [ 48.138446] Data abort info: [ 48.141422] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 48.147079] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 48.152354] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 48.157859] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001173cf000 [ 48.164517] [00000000000000d0] pgd=0000000000000000, p4d=0000000000000000 [ 48.171530] Internal error: Oops: 0000000096000004 [#1] SMP [ 48.177348] Modules linked in: q6prm_clocks q6apm_lpass_dais q6apm_dai snd_q6dsp_common q6prm snd_q6apm 8021q garp mrp stp llc snd_soc_hdmi_codec apr pdr_interface phy_qcom_edp fastrpc qcom_pd_mapper rpmsg_ctrl qrtr_smd rpmsg_char qcom_pdr_msg qcom_iris v4l2_mem2mem videobuf2_dma_contig ath11k_pci msm ubwc_config at24 ath11k videobuf2_memops mac80211 ocmem videobuf2_v4l2 libarc4 drm_gpuvm mhi qrtr videodev drm_exec snd_soc_sc8280xp gpu_sched videobuf2_common nvmem_qcom_spmi_sdam snd_soc_qcom_sdw drm_dp_aux_bus qcom_q6v5_pas qcom_spmi_temp_alarm snd_soc_qcom_common rtc_pm8xxx qcom_pon drm_display_helper cec qcom_pil_info qcom_stats soundwire_bus drm_client_lib mc dispcc0_sa8775p videocc_sa8775p qcom_q6v5 camcc_sa8775p snd_soc_dmic phy_qcom_sgmii_eth snd_soc_max98357a i2c_qcom_geni snd_soc_core dwmac_qcom_ethqos llcc_qcom icc_bwmon qcom_sysmon snd_compress qcom_refgen_regulator coresight_stm stmmac_platform snd_pcm_dmaengine qcom_common coresight_tmc stmmac coresight_replicator qcom_glink_smem coresight_cti stm_core [ 48.177444] coresight_funnel snd_pcm ufs_qcom phy_qcom_qmp_usb gpi phy_qcom_snps_femto_v2 coresight phy_qcom_qmp_ufs qcom_wdt gpucc_sa8775p pcs_xpcs mdt_loader qcom_ice icc_osm_l3 qmi_helpers snd_timer snd soundcore display_connector qcom_rng nvmem_reboot_mode drm_kms_helper phy_qcom_qmp_pcie sha256 cfg80211 rfkill socinfo fuse drm backlight ipv6 [ 48.301059] CPU: 2 UID: 0 PID: 293 Comm: kworker/u32:2 Not tainted 6.19.0-rc6-dirty #10 PREEMPT [ 48.310081] Hardware name: Qualcomm Technologies, Inc. Lemans EVK (DT) [ 48.316782] Workqueue: pdr_notifier_wq pdr_notifier_work [pdr_interface] [ 48.323672] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 48.330825] pc : mutex_lock+0xc/0x54 [ 48.334514] lr : soc_dapm_shutdown_dapm+0x44/0x174 [snd_soc_core] [ 48.340794] sp : ffff800084ddb7b0 [ 48.344207] x29: ffff800084ddb7b0 x28: ffff00009cd9cf30 x27: ffff00009cd9cc00 [ 48.351544] x26: ffff000099610190 x25: ffffa31d2f19c810 x24: ffffa31d2f185098 [ 48.358869] x23: ffff800084ddb7f8 x22: 0000000000000000 x21: 00000000000000d0 [ 48.366198] x20: ffff00009ba6c338 x19: ffff00009ba6c338 x18: 00000000ffffffff [ 48.373528] x17: 000000040044ffff x16: ffffa31d4ae6dca8 x15: 072007740775076f [ 48.380853] x14: 0765076d07690774 x13: 00313a323a656369 x12: 767265733a637673 [ 48.388182] x11: 00000000000003f9 x10: ffffa31d4c7dea98 x9 : 0000000000000001 [ 48.395519] x8 : ffff00009a2aadc0 x7 : 0000000000000003 x6 : 0000000000000000 [ 48.402854] x5 : 0000000000000000 x4 : 0000000000000028 x3 : ffff000ef397a698 [ 48.410180] x2 : ffff00009a2aadc0 x1 : 0000000000000000 x0 : 00000000000000d0 [ 48.417506] Call trace: [ 48.420025] mutex_lock+0xc/0x54 (P) [ 48.423712] snd_soc_dapm_shutdown+0x44/0xbc [snd_soc_core] [ 48.429447] soc_cleanup_card_resources+0x30/0x2c0 [snd_soc_core] [ 48.435719] snd_soc_bind_card+0x4dc/0xcc0 [snd_soc_core] [ 48.441278] snd_soc_add_component+0x27c/0x2c8 [snd_soc_core] [ 48.447192] snd_soc_register_component+0x9c/0xf4 [snd_soc_core] [ 48.453371] devm_snd_soc_register_component+0x64/0xc4 [snd_soc_core] [ 48.459994] apm_probe+0xb4/0x110 [snd_q6apm] [ 48.464479] apr_device_probe+0x24/0x40 [apr] [ 48.468964] really_probe+0xbc/0x298 [ 48.472651] __driver_probe_device+0x78/0x12c [ 48.477132] driver_probe_device+0x40/0x160 [ 48.481435] __device_attach_driver+0xb8/0x134 [ 48.486011] bus_for_each_drv+0x80/0xdc [ 48.489964] __device_attach+0xa8/0x1b0 [ 48.493916] device_initial_probe+0x50/0x54 [ 48.498219] bus_probe_device+0x38/0xa0 [ 48.502170] device_add+0x590/0x760 [ 48.505761] device_register+0x20/0x30 [ 48.509623] of_register_apr_devices+0x1d8/0x318 [apr] [ 48.514905] apr_pd_status+0x2c/0x54 [apr] [ 48.519114] pdr_notifier_work+0x8c/0xe0 [pdr_interface] [ 48.524570] process_one_work+0x150/0x294 [ 48.528692] worker_thread+0x2d8/0x3d8 [ 48.532551] kthread+0x130/0x204 [ 48.535874] ret_from_fork+0x10/0x20 [ 48.539559] Code: d65f03c0 d5384102 d503201f d2800001 (c8e17c02) [ 48.545823] ---[ end trace 0000000000000000 ]--- Fixes: 5477518 ("ASoC: qdsp6: audioreach: add q6apm support") Cc: [email protected] Signed-off-by: Ravi Hothi <[email protected]> Reviewed-by: Srinivas Kandagatla <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Mark Brown <[email protected]>

…ools() The numa_node can be < 0 since NUMA_NO_NODE = -1. However, struct blk_mq_hw_ctx{} defines numa_node as unsigned int. As a result, numa_node is set to UINT_MAX for NUMA_NO_NODE in blk_mq_alloc_hctx(). Later, nvme_setup_descriptor_pools() accesses descriptor_pools[numa_node]. Due to the above, it tries to access descriptor_pools[UINT_MAX]. The address is garbage but accessible because it is canonical and still within the slab memory range. Therefore, no page fault occurs, and KASAN cannot detect this since it is beyond the redzones. Subsequently, normal I/O calls dma_pool_alloc() with the garbage pool address. pool->next_block contains a wild pointer, causing a general protection fault (GPF). To fix this, this patch changes the type of numa_node to int and adds a check for NUMA_NO_NODE. Log: Oops: general protection fault, probably for non-canonical address 0xe9803b040854d02c: 0000 [#1] SMP KASAN PTI KASAN: maybe wild-memory-access in range [0x4c01f82042a68160-0x4c01f82042a68167][FEMU] Err: I/O cmd failed: opcode=0x2 status=0x4002 CPU: 0 UID: 0 PID: 112363 Comm: systemd-udevd Not tainted 6.19.0-dirty #10 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:pool_block_pop mm/dmapool.c:187 [inline] RIP: 0010:dma_pool_alloc+0x110/0x990 mm/dmapool.c:417 Code: 00 0f 85 a4 07 00 00 4c 8b 63 58 4d 85 e4 0f 84 12 01 00 00 e8 41 1d 93 ff 4c 89 e2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 a7 07 00 00 49 8b 04 24 48 8d 7b 68 48 89 fa 48 RSP: 0018:ffffc90002b9efd0 EFLAGS: 00010003 RAX: dffffc0000000000 RBX: ffff888005466800 RCX: ffffffff94faab7f RDX: 09803f040854d02c RSI: 6c9b26c9b26c9b27 RDI: ffff88800c725ea0 RBP: ffffc90002b9f060 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000003 R11: 0000000000000000 R12: 4c01f82042a68164 R13: ffff888005466800 R14: 0000000000000820 R15: ffff888007b29000 FS: 00007f2abc4ff8c0(0000) GS:ffff8880d1ff7000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056360eb89000 CR3: 000000000a480000 CR4: 00000000000006f0 Call Trace: <TASK> nvme_pci_setup_data_prp drivers/nvme/host/pci.c:906 [inline] nvme_map_data drivers/nvme/host/pci.c:1114 [inline] nvme_prep_rq.part.0+0x17d3/0x3c90 drivers/nvme/host/pci.c:1243 nvme_prep_rq drivers/nvme/host/pci.c:1239 [inline] nvme_prep_rq_batch drivers/nvme/host/pci.c:1321 [inline] nvme_queue_rqs+0x37b/0x8a0 drivers/nvme/host/pci.c:1336 __blk_mq_flush_list block/blk-mq.c:2848 [inline] __blk_mq_flush_list+0xaa/0xe0 block/blk-mq.c:2844 blk_mq_dispatch_queue_requests+0x4f5/0x990 block/blk-mq.c:2893 blk_mq_flush_plug_list+0x232/0x650 block/blk-mq.c:2981 __blk_flush_plug+0x2c3/0x510 block/blk-core.c:1225 blk_finish_plug block/blk-core.c:1252 [inline] blk_finish_plug+0x64/0xc0 block/blk-core.c:1249 read_pages+0x6bd/0x9d0 mm/readahead.c:176 page_cache_ra_unbounded+0x659/0x950 mm/readahead.c:269 do_page_cache_ra mm/readahead.c:332 [inline] force_page_cache_ra+0x282/0x3a0 mm/readahead.c:361 page_cache_sync_ra+0x201/0xbf0 mm/readahead.c:579 filemap_get_pages+0x3be/0x1990 mm/filemap.c:2690 filemap_read+0x3ea/0xdf0 mm/filemap.c:2800 blkdev_read_iter+0x1b8/0x520 block/fops.c:856 new_sync_read fs/read_write.c:491 [inline] vfs_read+0x90f/0xd80 fs/read_write.c:572 ksys_read+0x14e/0x280 fs/read_write.c:715 __do_sys_read fs/read_write.c:724 [inline] __se_sys_read fs/read_write.c:722 [inline] __x64_sys_read+0x7b/0xc0 fs/read_write.c:722 x64_sys_call+0x17ec/0x21b0 arch/x86/include/generated/asm/syscalls_64.h:1 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x8b/0x1200 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f2abc7b204e Code: 0f 1f 40 00 48 8b 15 79 af 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28 RSP: 002b:00007fff07113cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 000056360eb6a528 RCX: 00007f2abc7b204e RDX: 0000000000040000 RSI: 000056360eb6a538 RDI: 000000000000000f RBP: 000056360e8d23d0 R08: 000056360eb6a510 R09: 00007f2abc79abe0 R10: 0000000000040050 R11: 0000000000000246 R12: 000000003ff80000 R13: 0000000000040000 R14: 000056360eb6a510 R15: 000056360e8d2420 </TASK> Modules linked in: Fixes: 320ae51 ("blk-mq: new multi-queue block IO queueing mechanism") Fixes: d977506 ("nvme-pci: make PRP list DMA pools per-NUMA-node") Acked-by: Chao Shi <[email protected]> Acked-by: Weidong Zhu <[email protected]> Acked-by: Dave Tian <[email protected]> Signed-off-by: Sungwoo Kim <[email protected]>

When the i915 driver firmware binaries are not present, the set_default_submission pointer is not set. This pointer is dereferenced during suspend anyways. Add a check to make sure it is set before dereferencing. [ 23.289926] PM: suspend entry (deep) [ 23.293558] Filesystems sync: 0.000 seconds [ 23.298010] Freezing user space processes [ 23.302771] Freezing user space processes completed (elapsed 0.000 seconds) [ 23.309766] OOM killer disabled. [ 23.313027] Freezing remaining freezable tasks [ 23.318540] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 23.342038] serial 00:05: disabled [ 23.345719] serial 00:02: disabled [ 23.349342] serial 00:01: disabled [ 23.353782] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 23.358993] sd 1:0:0:0: [sdb] Synchronizing SCSI cache [ 23.361635] ata1.00: Entering standby power mode [ 23.368863] ata2.00: Entering standby power mode [ 23.445187] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 23.452194] #PF: supervisor instruction fetch in kernel mode [ 23.457896] #PF: error_code(0x0010) - not-present page [ 23.463065] PGD 0 P4D 0 [ 23.465640] Oops: Oops: 0010 [#1] SMP NOPTI [ 23.469869] CPU: 8 UID: 0 PID: 211 Comm: kworker/u48:18 Tainted: G S W 6.19.0-rc4-00020-gf0b9d8eb98df #10 PREEMPT(voluntary) [ 23.482512] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN [ 23.496511] Workqueue: async async_run_entry_fn [ 23.501087] RIP: 0010:0x0 [ 23.503755] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 23.510324] RSP: 0018:ffffb4a60065fca8 EFLAGS: 00010246 [ 23.515592] RAX: 0000000000000000 RBX: ffff9f428290e000 RCX: 000000000000000f [ 23.522765] RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff9f428290e000 [ 23.529937] RBP: ffff9f4282907070 R08: ffff9f4281130428 R09: 00000000ffffffff [ 23.537111] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9f42829070f8 [ 23.544284] R13: ffff9f4282906028 R14: ffff9f4282900000 R15: ffff9f4282906b68 [ 23.551457] FS: 0000000000000000(0000) GS:ffff9f466b2cf000(0000) knlGS:0000000000000000 [ 23.559588] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 23.565365] CR2: ffffffffffffffd6 CR3: 000000031c230001 CR4: 0000000000f70ef0 [ 23.572539] PKRU: 55555554 [ 23.575281] Call Trace: [ 23.577770] <TASK> [ 23.579905] intel_engines_reset_default_submission+0x42/0x60 [ 23.585695] __intel_gt_unset_wedged+0x191/0x200 [ 23.590360] intel_gt_unset_wedged+0x20/0x40 [ 23.594675] gt_sanitize+0x15e/0x170 [ 23.598290] i915_gem_suspend_late+0x6b/0x180 [ 23.602692] i915_drm_suspend_late+0x35/0xf0 [ 23.607008] ? __pfx_pci_pm_suspend_late+0x10/0x10 [ 23.611843] dpm_run_callback+0x78/0x1c0 [ 23.615817] device_suspend_late+0xde/0x2e0 [ 23.620037] async_suspend_late+0x18/0x30 [ 23.624082] async_run_entry_fn+0x25/0xa0 [ 23.628129] process_one_work+0x15b/0x380 [ 23.632182] worker_thread+0x2a5/0x3c0 [ 23.635973] ? __pfx_worker_thread+0x10/0x10 [ 23.640279] kthread+0xf6/0x1f0 [ 23.643464] ? __pfx_kthread+0x10/0x10 [ 23.647263] ? __pfx_kthread+0x10/0x10 [ 23.651045] ret_from_fork+0x131/0x190 [ 23.654837] ? __pfx_kthread+0x10/0x10 [ 23.658634] ret_from_fork_asm+0x1a/0x30 [ 23.662597] </TASK> [ 23.664826] Modules linked in: [ 23.667914] CR2: 0000000000000000 [ 23.671271] ------------[ cut here ]------------ Signed-off-by: Rahul Bukte <[email protected]> Reviewed-by: Suraj Kandpal <[email protected]> Signed-off-by: Suraj Kandpal <[email protected]> Link: https://patch.msgid.link/[email protected] (cherry picked from commit daa199a) Fixes: ff44ad5 ("drm/i915: Move engine->submit_request selection to a vfunc") Signed-off-by: Joonas Lahtinen <[email protected]>

…ools() The numa_node can be < 0 since NUMA_NO_NODE = -1. However, struct blk_mq_hw_ctx{} defines numa_node as unsigned int. As a result, numa_node is set to UINT_MAX for NUMA_NO_NODE in blk_mq_alloc_hctx(). Later, nvme_setup_descriptor_pools() accesses descriptor_pools[numa_node]. Due to the above, it tries to access descriptor_pools[UINT_MAX]. The address is garbage but accessible because it is canonical and still within the slab memory range. Therefore, no page fault occurs, and KASAN cannot detect this since it is beyond the redzones. Subsequently, normal I/O calls dma_pool_alloc() with the garbage pool address. pool->next_block contains a wild pointer, causing a general protection fault (GPF). To fix this, this patch changes the type of numa_node to int and adds a check for NUMA_NO_NODE. Log: Oops: general protection fault, probably for non-canonical address 0xe9803b040854d02c: 0000 [#1] SMP KASAN PTI KASAN: maybe wild-memory-access in range [0x4c01f82042a68160-0x4c01f82042a68167][FEMU] Err: I/O cmd failed: opcode=0x2 status=0x4002 CPU: 0 UID: 0 PID: 112363 Comm: systemd-udevd Not tainted 6.19.0-dirty #10 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:pool_block_pop mm/dmapool.c:187 [inline] RIP: 0010:dma_pool_alloc+0x110/0x990 mm/dmapool.c:417 Code: 00 0f 85 a4 07 00 00 4c 8b 63 58 4d 85 e4 0f 84 12 01 00 00 e8 41 1d 93 ff 4c 89 e2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 a7 07 00 00 49 8b 04 24 48 8d 7b 68 48 89 fa 48 RSP: 0018:ffffc90002b9efd0 EFLAGS: 00010003 RAX: dffffc0000000000 RBX: ffff888005466800 RCX: ffffffff94faab7f RDX: 09803f040854d02c RSI: 6c9b26c9b26c9b27 RDI: ffff88800c725ea0 RBP: ffffc90002b9f060 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000003 R11: 0000000000000000 R12: 4c01f82042a68164 R13: ffff888005466800 R14: 0000000000000820 R15: ffff888007b29000 FS: 00007f2abc4ff8c0(0000) GS:ffff8880d1ff7000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056360eb89000 CR3: 000000000a480000 CR4: 00000000000006f0 Call Trace: <TASK> nvme_pci_setup_data_prp drivers/nvme/host/pci.c:906 [inline] nvme_map_data drivers/nvme/host/pci.c:1114 [inline] nvme_prep_rq.part.0+0x17d3/0x3c90 drivers/nvme/host/pci.c:1243 nvme_prep_rq drivers/nvme/host/pci.c:1239 [inline] nvme_prep_rq_batch drivers/nvme/host/pci.c:1321 [inline] nvme_queue_rqs+0x37b/0x8a0 drivers/nvme/host/pci.c:1336 __blk_mq_flush_list block/blk-mq.c:2848 [inline] __blk_mq_flush_list+0xaa/0xe0 block/blk-mq.c:2844 blk_mq_dispatch_queue_requests+0x4f5/0x990 block/blk-mq.c:2893 blk_mq_flush_plug_list+0x232/0x650 block/blk-mq.c:2981 __blk_flush_plug+0x2c3/0x510 block/blk-core.c:1225 blk_finish_plug block/blk-core.c:1252 [inline] blk_finish_plug+0x64/0xc0 block/blk-core.c:1249 read_pages+0x6bd/0x9d0 mm/readahead.c:176 page_cache_ra_unbounded+0x659/0x950 mm/readahead.c:269 do_page_cache_ra mm/readahead.c:332 [inline] force_page_cache_ra+0x282/0x3a0 mm/readahead.c:361 page_cache_sync_ra+0x201/0xbf0 mm/readahead.c:579 filemap_get_pages+0x3be/0x1990 mm/filemap.c:2690 filemap_read+0x3ea/0xdf0 mm/filemap.c:2800 blkdev_read_iter+0x1b8/0x520 block/fops.c:856 new_sync_read fs/read_write.c:491 [inline] vfs_read+0x90f/0xd80 fs/read_write.c:572 ksys_read+0x14e/0x280 fs/read_write.c:715 __do_sys_read fs/read_write.c:724 [inline] __se_sys_read fs/read_write.c:722 [inline] __x64_sys_read+0x7b/0xc0 fs/read_write.c:722 x64_sys_call+0x17ec/0x21b0 arch/x86/include/generated/asm/syscalls_64.h:1 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x8b/0x1200 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f2abc7b204e Code: 0f 1f 40 00 48 8b 15 79 af 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28 RSP: 002b:00007fff07113cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 000056360eb6a528 RCX: 00007f2abc7b204e RDX: 0000000000040000 RSI: 000056360eb6a538 RDI: 000000000000000f RBP: 000056360e8d23d0 R08: 000056360eb6a510 R09: 00007f2abc79abe0 R10: 0000000000040050 R11: 0000000000000246 R12: 000000003ff80000 R13: 0000000000040000 R14: 000056360eb6a510 R15: 000056360e8d2420 </TASK> Modules linked in: Fixes: 320ae51 ("blk-mq: new multi-queue block IO queueing mechanism") Fixes: d977506 ("nvme-pci: make PRP list DMA pools per-NUMA-node") Acked-by: Chao Shi <[email protected]> Acked-by: Weidong Zhu <[email protected]> Acked-by: Dave Tian <[email protected]> Signed-off-by: Sungwoo Kim <[email protected]>

axboe and others added 8 commits June 2, 2025 12:00

Merge branch 'io_uring-6.16' into for-next

1672aaf

* io_uring-6.16: MAINTAINERS: remove myself from io_uring io_uring/net: only consider msg_inq if larger than 1 io_uring/zcrx: fix area release on registration failure io_uring/zcrx: init id for xa_find

Merge branch 'io_uring-6.16' into for-next

54c27d6

* io_uring-6.16: io_uring/kbuf: limit legacy provided buffer lists to USHRT_MAX

Merge branch 'block-6.16' into for-next

e74d036

* block-6.16: block: drop direction param from bio_integrity_copy_user()

Merge branch 'block-6.16' into for-next

a1dcac5

* block-6.16: selftests: ublk: kublk: improve behavior on init failure block: flip iter directions in blk_rq_integrity_map_user()

Merge branch 'io_uring-6.16' into for-next

6784ad1

* io_uring-6.16: io_uring/futex: mark wait requests as inflight io_uring/futex: get rid of struct io_futex addr union

Dummy commit

60b7228

blktests-ci Bot added merge-conflict new V1 for-next labels Jun 10, 2025

blktests-ci Bot force-pushed the for-next_base branch from a73409f to d74f66f Compare July 10, 2025 11:56

blktests-ci Bot removed the merge-conflict label Jul 10, 2025

blktests-ci Bot closed this Jul 10, 2025

blktests-ci Bot deleted the series/969282=>for-next branch July 23, 2025 02:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block fixes and updates for 6.16-rc1#10

Block fixes and updates for 6.16-rc1#10
blktests-ci[bot] wants to merge 8 commits intofor-next_basefrom
series/969282=>for-next

blktests-ci Bot commented Jun 10, 2025

Uh oh!

blktests-ci Bot commented Jun 10, 2025

Uh oh!

blktests-ci Bot commented Jul 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

blktests-ci Bot commented Jun 10, 2025

Uh oh!

blktests-ci Bot commented Jun 10, 2025

Uh oh!

blktests-ci Bot commented Jul 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants