Skip to content

block: introduce pi_size field in blk_integrity#6

Closed
blktests-ci[bot] wants to merge 10 commits intofor-next_basefrom
series/968998=>for-next
Closed

block: introduce pi_size field in blk_integrity#6
blktests-ci[bot] wants to merge 10 commits intofor-next_basefrom
series/968998=>for-next

Conversation

@blktests-ci
Copy link
Copy Markdown

@blktests-ci blktests-ci Bot commented Jun 10, 2025

Pull request for series with
subject: block: introduce pi_size field in blk_integrity
version: 2
url: https://patchwork.kernel.org/project/linux-block/list/?series=968998

axboe and others added 10 commits June 2, 2025 12:00
* io_uring-6.16:
  MAINTAINERS: remove myself from io_uring
  io_uring/net: only consider msg_inq if larger than 1
  io_uring/zcrx: fix area release on registration failure
  io_uring/zcrx: init id for xa_find
* block-6.16:
  selftests: ublk: cover PER_IO_DAEMON in more stress tests
  Documentation: ublk: document UBLK_F_PER_IO_DAEMON
  selftests: ublk: add stress test for per io daemons
  selftests: ublk: add functional test for per io daemons
  selftests: ublk: kublk: decouple ublk_queues from ublk server threads
  selftests: ublk: kublk: move per-thread data out of ublk_queue
  selftests: ublk: kublk: lift queue initialization out of thread
  selftests: ublk: kublk: tie sqe allocation to io instead of queue
  selftests: ublk: kublk: plumb q_id in io_uring user_data
  ublk: have a per-io daemon instead of a per-queue daemon
  md/md-bitmap: remove parameter slot from bitmap_create()
  md/md-bitmap: cleanup bitmap_ops->startwrite()
  md/dm-raid: remove max_write_behind setting limit
  md/md-bitmap: fix dm-raid max_write_behind setting
  md/raid1,raid10: don't handle IO error for REQ_RAHEAD and REQ_NOWAIT
  loop: add file_start_write() and file_end_write()
  bcache: reserve more RESERVE_BTREE buckets to prevent allocator hang
  bcache: remove unused constants
  bcache: fix NULL pointer in cache_set_flush()
* io_uring-6.16:
  io_uring/kbuf: limit legacy provided buffer lists to USHRT_MAX
* block-6.16:
  block: drop direction param from bio_integrity_copy_user()
* block-6.16:
  selftests: ublk: kublk: improve behavior on init failure
  block: flip iter directions in blk_rq_integrity_map_user()
* io_uring-6.16:
  io_uring/futex: mark wait requests as inflight
  io_uring/futex: get rid of struct io_futex addr union
* block-6.16:
  nvme: spelling fixes
  nvme-tcp: fix I/O stalls on congested sockets
  nvme-tcp: sanitize request list handling
  nvme-tcp: remove tag set when second admin queue config fails
  nvme: enable vectored registered bufs for passthrough cmds
  nvme: fix implicit bool to flags conversion
  nvme: fix command limits status code
Introduce a new pi_size field in struct blk_integrity to explicitly
represent the size (in bytes) of the protection information (PI) tuple.
This is a prep patch.

Signed-off-by: Anuj Gupta <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Add a new ioctl, FS_IOC_GETPICAP, to query protection info (PI)
capabilities. This ioctl returns information about the files integrity
profile. This is useful for userspace applications to understand a files
end-to-end data protection support and configure the I/O accordingly.

For now this interface is only supported by block devices. However the
design and placement of this ioctl in generic FS ioctl space allows us
to extend it to work over files as well. This maybe useful when
filesystems start supporting  PI-aware layouts.

A new structure struct fs_pi_cap is introduced, which contains the
following fields:
1. fpc_flags: bitmask of capability flags.
2. fpc_interval: the data block interval (in bytes) for which the
protection information is generated.
3. fpc_csum type: type of checksum used.
4. fpc_metadata_size: size (in bytes) of the metadata associated with each
interval.
5. fpc_pi_size: size (in bytes) of the PI associated with each interval.
6. fpc_tag_size: size (in bytes) of tag information.
7. pi_offset: offset of protection information tuple within the
metadata.
8. fpc_ref_tag_size: size in bytes of the reference tag.
9. fpc_storage_tag_size: size in bytes of the storage tag.
10. fpc_rsvd: reserved for future use.

The internal logic to fetch the capability is encapsulated in a helper
function blk_get_pi_cap(), which uses the blk_integrity profile
associated with the device. The ioctl returns -EOPNOTSUPP, if
CONFIG_BLK_DEV_INTEGRITY is not enabled.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Jun 10, 2025

Upstream branch: 38f4878
series: https://patchwork.kernel.org/project/linux-block/list/?series=968998
version: 2

@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Jul 10, 2025

Upstream branch: f4ca523
series: https://patchwork.kernel.org/project/linux-block/list/?series=970433
version: 3

@blktests-ci blktests-ci Bot added V3 and removed V2 labels Jul 10, 2025
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Jul 10, 2025

Upstream branch: f4ca523
series: https://patchwork.kernel.org/project/linux-block/list/?series=970433
version: 3

@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Jul 10, 2025

Github failed to update this PR after force push. Close it.

@blktests-ci blktests-ci Bot closed this Jul 10, 2025
blktests-ci Bot pushed a commit that referenced this pull request Jul 23, 2025
…/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.16, take #6

- Fix use of u64_replace_bits() in adjusting the guest's view of
  MDCR_EL2.HPMN.
@blktests-ci blktests-ci Bot deleted the series/968998=>for-next branch July 23, 2025 02:12
blktests-ci Bot pushed a commit that referenced this pull request Aug 2, 2025
pert script tests fails with segmentation fault as below:

  92: perf script tests:
  --- start ---
  test child forked, pid 103769
  DB test
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.012 MB /tmp/perf-test-script.7rbftEpOzX/perf.data (9 samples) ]
  /usr/libexec/perf-core/tests/shell/script.sh: line 35:
  103780 Segmentation fault      (core dumped)
  perf script -i "${perfdatafile}" -s "${db_test}"
  --- Cleaning up ---
  ---- end(-1) ----
  92: perf script tests                                               : FAILED!

Backtrace pointed to :
	#0  0x0000000010247dd0 in maps.machine ()
	#1  0x00000000101d178c in db_export.sample ()
	#2  0x00000000103412c8 in python_process_event ()
	#3  0x000000001004eb28 in process_sample_event ()
	#4  0x000000001024fcd0 in machines.deliver_event ()
	#5  0x000000001025005c in perf_session.deliver_event ()
	#6  0x00000000102568b0 in __ordered_events__flush.part.0 ()
	#7  0x0000000010251618 in perf_session.process_events ()
	#8  0x0000000010053620 in cmd_script ()
	#9  0x00000000100b5a28 in run_builtin ()
	#10 0x00000000100b5f94 in handle_internal_command ()
	#11 0x0000000010011114 in main ()

Further investigation reveals that this occurs in the `perf script tests`,
because it uses `db_test.py` script. This script sets `perf_db_export_mode = True`.

With `perf_db_export_mode` enabled, if a sample originates from a hypervisor,
perf doesn't set maps for "[H]" sample in the code. Consequently, `al->maps` remains NULL
when `maps__machine(al->maps)` is called from `db_export__sample`.

As al->maps can be NULL in case of Hypervisor samples , use thread->maps
because even for Hypervisor sample, machine should exist.
If we don't have machine for some reason, return -1 to avoid segmentation fault.

Reported-by: Disha Goel <[email protected]>
Signed-off-by: Aditya Bodkhe <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Tested-by: Disha Goel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Suggested-by: Adrian Hunter <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Aug 2, 2025
Without the change `perf `hangs up on charaster devices. On my system
it's enough to run system-wide sampler for a few seconds to get the
hangup:

    $ perf record -a -g --call-graph=dwarf
    $ perf report
    # hung

`strace` shows that hangup happens on reading on a character device
`/dev/dri/renderD128`

    $ strace -y -f -p 2780484
    strace: Process 2780484 attached
    pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached

It's call trace descends into `elfutils`:

    $ gdb -p 2780484
    (gdb) bt
    #0  0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0)
        at ../sysdeps/unix/sysv/linux/pread64.c:25
    #1  0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1
    #2  0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #3  0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #4  0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 ()
       from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #5  0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0)
        at util/dso.h:537
    #6  0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114
    #7  frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242
    #8  0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #9  0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0,
        thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127,
        best_effort=best_effort@entry=false) at util/thread.h:152
    #13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0,
        sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939
    #14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540,
        max_stack=127, symbols=true) at util/machine.c:2920
    #15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440,
        sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true)
        at util/machine.c:2970
    #16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440,
        sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198
    #17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0,
        evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127
    #18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127,
        arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255
    #19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540,
        evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334
    #20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0,
        file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367
    #21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245
    #22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324
    #23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224,
        file_path=0x105ffbf0 "perf.data") at util/session.c:1419
    #24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0,
    --Type <RET> for more, q to quit, c to continue without paging--
    quit
        prog=prog@entry=0x7fff9df81220) at util/session.c:2132
    #25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220)
        at util/session.c:2181
    #26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226
    #27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390
    #28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076
    #29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827
    #30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0)
        at perf.c:351
    #31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404
    #32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448
    #33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556

The hangup happens because nothing in` perf` or `elfutils` checks if a
mapped file is easily readable.

The change conservatively skips all non-regular files.

Signed-off-by: Sergei Trofimovich <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Aug 2, 2025
Symbolize stack traces by creating a live machine. Add this
functionality to dump_stack and switch dump_stack users to use
it. Switch TUI to use it. Add stack traces to the child test function
which can be useful to diagnose blocked code.

Example output:
```
$ perf test -vv PERF_RECORD_
...
  7: PERF_RECORD_* events & perf_sample fields:
  7: PERF_RECORD_* events & perf_sample fields                       : Running (1 active)
^C
Signal (2) while running tests.
Terminating tests with the same signal
Internal test harness failure. Completing any started tests:
:  7: PERF_RECORD_* events & perf_sample fields:

---- unexpected signal (2) ----
    #0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
    #1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #2 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
    #3 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
    #4 0x7fc12fef1393 in __nanosleep nanosleep.c:26
    #5 0x7fc12ff02d68 in __sleep sleep.c:55
    #6 0x55788c63196b in test__PERF_RECORD perf-record.c:0
    #7 0x55788c620fb0 in run_test_child builtin-test.c:0
    #8 0x55788c5bd18d in start_command run-command.c:127
    #9 0x55788c621ef3 in __cmd_test builtin-test.c:0
    #10 0x55788c6225bf in cmd_test ??:0
    #11 0x55788c5afbd0 in run_builtin perf.c:0
    #12 0x55788c5afeeb in handle_internal_command perf.c:0
    #13 0x55788c52b383 in main ??:0
    #14 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
    #15 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    #16 0x55788c52b9d1 in _start ??:0

---- unexpected signal (2) ----
    #0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
    #1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #2 0x7fc12fea3a14 in pthread_sigmask@GLIBC_2.2.5 pthread_sigmask.c:45
    #3 0x7fc12fe49fd9 in __GI___sigprocmask sigprocmask.c:26
    #4 0x7fc12ff2601b in __longjmp_chk longjmp.c:36
    #5 0x55788c6210c0 in print_test_result.isra.0 builtin-test.c:0
    #6 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #7 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
    #8 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
    #9 0x7fc12fef1393 in __nanosleep nanosleep.c:26
    #10 0x7fc12ff02d68 in __sleep sleep.c:55
    #11 0x55788c63196b in test__PERF_RECORD perf-record.c:0
    #12 0x55788c620fb0 in run_test_child builtin-test.c:0
    #13 0x55788c5bd18d in start_command run-command.c:127
    #14 0x55788c621ef3 in __cmd_test builtin-test.c:0
    #15 0x55788c6225bf in cmd_test ??:0
    #16 0x55788c5afbd0 in run_builtin perf.c:0
    #17 0x55788c5afeeb in handle_internal_command perf.c:0
    #18 0x55788c52b383 in main ??:0
    #19 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
    #20 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    #21 0x55788c52b9d1 in _start ??:0
  7: PERF_RECORD_* events & perf_sample fields                       : Skip (permissions)
```

Signed-off-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Aug 2, 2025
Calling perf top with branch filters enabled on Intel CPU's
with branch counters logging (A.K.A LBR event logging [1]) support
results in a segfault.

$ perf top  -e '{cpu_core/cpu-cycles/,cpu_core/event=0xc6,umask=0x3,frontend=0x11,name=frontend_retired_dsb_miss/}' -j any,counter
...
Thread 27 "perf" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffafff76c0 (LWP 949003)]
perf_env__find_br_cntr_info (env=0xf66dc0 <perf_env>, nr=0x0, width=0x7fffafff62c0) at util/env.c:653
653			*width = env->cpu_pmu_caps ? env->br_cntr_width :
(gdb) bt
 #0  perf_env__find_br_cntr_info (env=0xf66dc0 <perf_env>, nr=0x0, width=0x7fffafff62c0) at util/env.c:653
 #1  0x00000000005b1599 in symbol__account_br_cntr (branch=0x7fffcc3db580, evsel=0xfea2d0, offset=12, br_cntr=8) at util/annotate.c:345
 #2  0x00000000005b17fb in symbol__account_cycles (addr=5658172, start=5658160, sym=0x7fffcc0ee420, cycles=539, evsel=0xfea2d0, br_cntr=8) at util/annotate.c:389
 #3  0x00000000005b1976 in addr_map_symbol__account_cycles (ams=0x7fffcd7b01d0, start=0x7fffcd7b02b0, cycles=539, evsel=0xfea2d0, br_cntr=8) at util/annotate.c:422
 #4  0x000000000068d57f in hist__account_cycles (bs=0x110d288, al=0x7fffafff6540, sample=0x7fffafff6760, nonany_branch_mode=false, total_cycles=0x0, evsel=0xfea2d0) at util/hist.c:2850
 #5  0x0000000000446216 in hist_iter__top_callback (iter=0x7fffafff6590, al=0x7fffafff6540, single=true, arg=0x7fffffff9e00) at builtin-top.c:737
 #6  0x0000000000689787 in hist_entry_iter__add (iter=0x7fffafff6590, al=0x7fffafff6540, max_stack_depth=127, arg=0x7fffffff9e00) at util/hist.c:1359
 #7  0x0000000000446710 in perf_event__process_sample (tool=0x7fffffff9e00, event=0x110d250, evsel=0xfea2d0, sample=0x7fffafff6760, machine=0x108c968) at builtin-top.c:845
 #8  0x0000000000447735 in deliver_event (qe=0x7fffffffa120, qevent=0x10fc200) at builtin-top.c:1211
 #9  0x000000000064ccae in do_flush (oe=0x7fffffffa120, show_progress=false) at util/ordered-events.c:245
 #10 0x000000000064d005 in __ordered_events__flush (oe=0x7fffffffa120, how=OE_FLUSH__TOP, timestamp=0) at util/ordered-events.c:324
 #11 0x000000000064d0ef in ordered_events__flush (oe=0x7fffffffa120, how=OE_FLUSH__TOP) at util/ordered-events.c:342
 #12 0x00000000004472a9 in process_thread (arg=0x7fffffff9e00) at builtin-top.c:1120
 #13 0x00007ffff6e7dba8 in start_thread (arg=<optimized out>) at pthread_create.c:448
 #14 0x00007ffff6f01b8c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

The cause is that perf_env__find_br_cntr_info tries to access a
null pointer pmu_caps in the perf_env struct. A similar issue exists
for homogeneous core systems which use the cpu_pmu_caps structure.

Fix this by populating cpu_pmu_caps and pmu_caps structures with
values from sysfs when calling perf top with branch stack sampling
enabled.

[1], LBR event logging introduced here:
https://lore.kernel.org/all/[email protected]/

Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Thomas Falcon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Aug 31, 2025
These iterations require the read lock, otherwise RCU
lockdep will splat:

=============================
WARNING: suspicious RCU usage
6.17.0-rc3-00014-g31419c045d64 #6 Tainted: G           O
-----------------------------
drivers/base/power/main.c:1333 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
5 locks held by rtcwake/547:
 #0: 00000000643ab418 (sb_writers#6){.+.+}-{0:0}, at: file_start_write+0x2b/0x3a
 #1: 0000000067a0ca88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x181/0x24b
 #2: 00000000631eac40 (kn->active#3){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x191/0x24b
 #3: 00000000609a1308 (system_transition_mutex){+.+.}-{4:4}, at: pm_suspend+0xaf/0x30b
 #4: 0000000060c0fdb0 (device_links_srcu){.+.+}-{0:0}, at: device_links_read_lock+0x75/0x98

stack backtrace:
CPU: 0 UID: 0 PID: 547 Comm: rtcwake Tainted: G           O        6.17.0-rc3-00014-g31419c045d64 #6 VOLUNTARY
Tainted: [O]=OOT_MODULE
Stack:
 223721b3a80 6089eac6 00000001 00000001
 ffffff00 6089eac6 00000535 6086e528
 721b3ac0 6003c294 00000000 60031fc0
Call Trace:
 [<600407ed>] show_stack+0x10e/0x127
 [<6003c294>] dump_stack_lvl+0x77/0xc6
 [<6003c2fd>] dump_stack+0x1a/0x20
 [<600bc2f8>] lockdep_rcu_suspicious+0x116/0x13e
 [<603d8ea1>] dpm_async_suspend_superior+0x117/0x17e
 [<603d980f>] device_suspend+0x528/0x541
 [<603da24b>] dpm_suspend+0x1a2/0x267
 [<603da837>] dpm_suspend_start+0x5d/0x72
 [<600ca0c9>] suspend_devices_and_enter+0xab/0x736
 [...]

Add the fourth argument to the iteration to annotate
this and avoid the splat.

Fixes: 0679963 ("PM: sleep: Make async suspend handle suppliers like parents")
Fixes: ed18738 ("PM: sleep: Make async resume handle consumers like children")
Signed-off-by: Johannes Berg <[email protected]>
Link: https://patch.msgid.link/20250826134348.aba79f6e6299.I9ecf55da46ccf33778f2c018a82e1819d815b348@changeid
Signed-off-by: Rafael J. Wysocki <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Sep 16, 2025
Commit 0e2f80a("fs/dax: ensure all pages are idle prior to
filesystem unmount") introduced the WARN_ON_ONCE to capture whether
the filesystem has removed all DAX entries or not and applied the
fix to xfs and ext4.

Apply the missed fix on erofs to fix the runtime warning:

[  5.266254] ------------[ cut here ]------------
[  5.266274] WARNING: CPU: 6 PID: 3109 at mm/truncate.c:89 truncate_folio_batch_exceptionals+0xff/0x260
[  5.266294] Modules linked in:
[  5.266999] CPU: 6 UID: 0 PID: 3109 Comm: umount Tainted: G S                  6.16.0+ #6 PREEMPT(voluntary)
[  5.267012] Tainted: [S]=CPU_OUT_OF_SPEC
[  5.267017] Hardware name: Dell Inc. OptiPlex 5000/05WXFV, BIOS 1.5.1 08/24/2022
[  5.267024] RIP: 0010:truncate_folio_batch_exceptionals+0xff/0x260
[  5.267076] Code: 00 00 41 39 df 7f 11 eb 78 83 c3 01 49 83 c4 08 41 39 df 74 6c 48 63 f3 48 83 fe 1f 0f 83 3c 01 00 00 43 f6 44 26 08 01 74 df <0f> 0b 4a 8b 34 22 4c 89 ef 48 89 55 90 e8 ff 54 1f 00 48 8b 55 90
[  5.267083] RSP: 0018:ffffc900013f36c8 EFLAGS: 00010202
[  5.267095] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  5.267101] RDX: ffffc900013f3790 RSI: 0000000000000000 RDI: ffff8882a1407898
[  5.267108] RBP: ffffc900013f3740 R08: 0000000000000000 R09: 0000000000000000
[  5.267113] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  5.267119] R13: ffff8882a1407ab8 R14: ffffc900013f3888 R15: 0000000000000001
[  5.267125] FS:  00007aaa8b437800(0000) GS:ffff88850025b000(0000) knlGS:0000000000000000
[  5.267132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  5.267138] CR2: 00007aaa8b3aac10 CR3: 000000024f764000 CR4: 0000000000f52ef0
[  5.267144] PKRU: 55555554
[  5.267150] Call Trace:
[  5.267154]  <TASK>
[  5.267181]  truncate_inode_pages_range+0x118/0x5e0
[  5.267193]  ? save_trace+0x54/0x390
[  5.267296]  truncate_inode_pages_final+0x43/0x60
[  5.267309]  evict+0x2a4/0x2c0
[  5.267339]  dispose_list+0x39/0x80
[  5.267352]  evict_inodes+0x150/0x1b0
[  5.267376]  generic_shutdown_super+0x41/0x180
[  5.267390]  kill_block_super+0x1b/0x50
[  5.267402]  erofs_kill_sb+0x81/0x90 [erofs]
[  5.267436]  deactivate_locked_super+0x32/0xb0
[  5.267450]  deactivate_super+0x46/0x60
[  5.267460]  cleanup_mnt+0xc3/0x170
[  5.267475]  __cleanup_mnt+0x12/0x20
[  5.267485]  task_work_run+0x5d/0xb0
[  5.267499]  exit_to_user_mode_loop+0x144/0x170
[  5.267512]  do_syscall_64+0x2b9/0x7c0
[  5.267523]  ? __lock_acquire+0x665/0x2ce0
[  5.267535]  ? __lock_acquire+0x665/0x2ce0
[  5.267560]  ? lock_acquire+0xcd/0x300
[  5.267573]  ? find_held_lock+0x31/0x90
[  5.267582]  ? mntput_no_expire+0x97/0x4e0
[  5.267606]  ? mntput_no_expire+0xa1/0x4e0
[  5.267625]  ? mntput+0x24/0x50
[  5.267634]  ? path_put+0x1e/0x30
[  5.267647]  ? do_faccessat+0x120/0x2f0
[  5.267677]  ? do_syscall_64+0x1a2/0x7c0
[  5.267686]  ? from_kgid_munged+0x17/0x30
[  5.267703]  ? from_kuid_munged+0x13/0x30
[  5.267711]  ? __do_sys_getuid+0x3d/0x50
[  5.267724]  ? do_syscall_64+0x1a2/0x7c0
[  5.267732]  ? irqentry_exit+0x77/0xb0
[  5.267743]  ? clear_bhb_loop+0x30/0x80
[  5.267752]  ? clear_bhb_loop+0x30/0x80
[  5.267765]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  5.267772] RIP: 0033:0x7aaa8b32a9fb
[  5.267781] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 83 0d 00 f7 d8
[  5.267787] RSP: 002b:00007ffd7c4c9468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[  5.267796] RAX: 0000000000000000 RBX: 00005a61592a8b00 RCX: 00007aaa8b32a9fb
[  5.267802] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005a61592b2080
[  5.267806] RBP: 00007ffd7c4c9540 R08: 00007aaa8b403b20 R09: 0000000000000020
[  5.267812] R10: 0000000000000001 R11: 0000000000000246 R12: 00005a61592a8c00
[  5.267817] R13: 0000000000000000 R14: 00005a61592b2080 R15: 00005a61592a8f10
[  5.267849]  </TASK>
[  5.267854] irq event stamp: 4721
[  5.267859] hardirqs last  enabled at (4727): [<ffffffff814abf50>] __up_console_sem+0x90/0xa0
[  5.267873] hardirqs last disabled at (4732): [<ffffffff814abf35>] __up_console_sem+0x75/0xa0
[  5.267884] softirqs last  enabled at (3044): [<ffffffff8132adb3>] kernel_fpu_end+0x53/0x70
[  5.267895] softirqs last disabled at (3042): [<ffffffff8132b5f4>] kernel_fpu_begin_mask+0xc4/0x120
[  5.267905] ---[ end trace 0000000000000000 ]---

Fixes: bde708f ("fs/dax: always remove DAX page-cache entries when breaking layouts")
Signed-off-by: Yuezhang Mo <[email protected]>
Reviewed-by: Friendy Su <[email protected]>
Reviewed-by: Daniel Palmer <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Sep 25, 2025
Currently, if CCW request creation fails with -EINVAL, the DASD driver
returns BLK_STS_IOERR to the block layer.

This can happen, for example, when a user-space application such as QEMU
passes a misaligned buffer, but the original cause of the error is
masked as a generic I/O error.

This patch changes the behavior so that -EINVAL is returned as
BLK_STS_INVAL, allowing user space to properly detect alignment issues
instead of interpreting them as I/O errors.

Reviewed-by: Stefan Haberland <[email protected]>
Cc: [email protected] #6.11+
Signed-off-by: Jaehoon Kim <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 10, 2026
This leak will cause a hang when tearing down the SCSI host. For example,
iscsid hangs with the following call trace:

[130120.652718] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured

PID: 2528     TASK: ffff9d0408974e00  CPU: 3    COMMAND: "iscsid"
 #0 [ffffb5b9c134b9e0] __schedule at ffffffff860657d4
 #1 [ffffb5b9c134ba28] schedule at ffffffff86065c6f
 #2 [ffffb5b9c134ba40] schedule_timeout at ffffffff86069fb0
 #3 [ffffb5b9c134bab0] __wait_for_common at ffffffff8606674f
 #4 [ffffb5b9c134bb10] scsi_remove_host at ffffffff85bfe84b
 #5 [ffffb5b9c134bb30] iscsi_sw_tcp_session_destroy at ffffffffc03031c4 [iscsi_tcp]
 #6 [ffffb5b9c134bb48] iscsi_if_recv_msg at ffffffffc0292692 [scsi_transport_iscsi]
 #7 [ffffb5b9c134bb98] iscsi_if_rx at ffffffffc02929c2 [scsi_transport_iscsi]
 #8 [ffffb5b9c134bbf0] netlink_unicast at ffffffff85e551d6
 #9 [ffffb5b9c134bc38] netlink_sendmsg at ffffffff85e554ef

Fixes: 8fe4ce5 ("scsi: core: Fix a use-after-free")
Cc: [email protected]
Signed-off-by: Junxiao Bi <[email protected]>
Reviewed-by: Mike Christie <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 10, 2026
A malicious user program can request large user-memory pinning via
ioctl with a large metadata_len. However, this does not guarantee
that all requested memory will be pinned. Pinning may partially succeed
and return the number of bytes that were actually pinned, which may not
match the requested size. In this case, only the addresses of the
pinned pages are valid.

The current implementation does not handle partial pinning and
incorrectly assumes that all pages in the range [0, nr_vecs) are valid.
This can lead to a null-pointer dereference because pages[n] may refer
to an unpinned memory range.

To fix this, add a check to verify that all requested pages are
successfully pinned. Pinning all pages is required to copy user data.

KASAN splat:

Syzkaller hit 'general protection fault in bio_integrity_map_user' bug.

nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: 2/0/0 default/read/poll queues
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 280 Comm: syz-executor294 Not tainted 6.11.0-dirty #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:_compound_head home/wukong/fuzznvme/linux/./include/linux/page-flags.h:240 [inline]
RIP: 0010:bvec_from_pages home/wukong/fuzznvme/linux/block/bio-integrity.c:290 [inline]
RIP: 0010:bio_integrity_map_user+0x5a3/0x11e0 home/wukong/fuzznvme/linux/block/bio-integrity.c:345
Code: 4c 89 e0 48 c1 e8 03 80 3c 30 00 0f 85 4b 0a 00 00 48 be 00 00 00 00 00 fc ff df 49 8b 1c 24 48 8d 7b 08 48 89 f8 48 c1 e8 03 <80> 3c 30 00 0f 85 35 0a 00 00 48 8b 43 08 31 ff 49 89 c5 48 89 44
RSP: 0018:ffffc900010cf4f0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000f761
RDX: ffff888006cae600 RSI: dffffc0000000000 RDI: 0000000000000008
RBP: ffffc900010cf7d0 R08: ffff888006cae600 R09: ffffed1000e0db95
R10: ffff88800706dcaf R11: ffff888006a31a00 R12: ffff888006a31a08
R13: 0000000000000740 R14: ffff888006a31a00 R15: 0000000000000001
FS:  0000555587e483c0(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002002f8c0 CR3: 000000000719a000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 nvme_map_user_request+0x4b6/0x5e0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:149
 nvme_submit_user_cmd+0x2e8/0x3c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:185
 nvme_user_cmd.constprop.0+0x35b/0x540 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:325
 nvme_ns_ioctl+0x11e/0x1c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:570
 nvme_ioctl+0x147/0x1d0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:605
 blkdev_ioctl+0x28c/0x6c0 home/wukong/fuzznvme/linux/block/ioctl.c:676
 vfs_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:51 [inline]
 __do_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:907 [inline]
 __se_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:893 [inline]
 __x64_sys_ioctl+0x1bc/0x230 home/wukong/fuzznvme/linux/fs/ioctl.c:893
 x64_sys_call+0x1209/0x20d0 home/wukong/fuzznvme/linux/./arch/x86/include/generated/asm/syscalls_64.h:17
 do_syscall_x64 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x6f/0x110 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5b3d8e98bd
Code: c3 e8 a7 1f 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd1c0e988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f5b3d8e98bd
RDX: 000000002003f680 RSI: 00000000c0484e43 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007f5b3d93eb4d R09: 00007f5b3d93eb4d
R10: 00007f5b3d93eb4d R11: 0000000000000246 R12: 0000000000000001
R13: 00007fffd1c0ebe8 R14: 00007fffd1c0e9b0 R15: 00007fffd1c0e9a0
 </TASK>
Modules linked in:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#2] PREEMPT SMP KASAN PTI

Fixes: 492c5d4 (block: bio-integrity: directly map user buffers)
Acked-by: Chao Shi <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Acked-by: Dave Tian <[email protected]>
Signed-off-by: Sungwoo Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 10, 2026
Quiesce and resume is a mechanism to suspend operations on DASD devices.
In the context of a controlled copy pair swap operation, the quiesce
operation is usually issued before the actual swap and a resume
afterwards.

During the swap operation, the underlying device is exchanged. Therefore,
the quiesce flag must be moved to the secondary device to ensure a
consistent quiesce state after the swap.

The secondary device itself cannot be suspended separately because there
is no separate block device representation for it.

Fixes: 413862c ("s390/dasd: add copy pair swap capability")
Cc: [email protected] #6.1
Reviewed-by: Jan Hoeppner <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 10, 2026
During online processing for a DASD device an IO operation is started to
determine the format of the device. CDL format contains specifically
sized blocks at the beginning of the disk.

For a PPRC secondary device no real IO operation is possible therefore
this IO request can not be started and this step is skipped for online
processing of secondary devices. This is generally fine since the
secondary is a copy of the primary device.

In case of an additional partition detection that is run after a swap
operation the format information is needed to properly drive partition
detection IO.

Currently the information is not passed leading to IO errors during
partition detection and a wrongly detected partition table which in turn
might lead to data corruption on the disk with the wrong partition table.

Fix by passing the format information from primary to secondary device.

Fixes: 413862c ("s390/dasd: add copy pair swap capability")
Cc: [email protected] #6.1
Reviewed-by: Jan Hoeppner <[email protected]>
Acked-by: Eduard Shishkin <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 10, 2026
Quiesce and resume is a mechanism to suspend operations on DASD devices.
In the context of a controlled copy pair swap operation, the quiesce
operation is usually issued before the actual swap and a resume
afterwards.

During the swap operation, the underlying device is exchanged. Therefore,
the quiesce flag must be moved to the secondary device to ensure a
consistent quiesce state after the swap.

The secondary device itself cannot be suspended separately because there
is no separate block device representation for it.

Fixes: 413862c ("s390/dasd: add copy pair swap capability")
Cc: [email protected] #6.1
Reviewed-by: Jan Hoeppner <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 10, 2026
During online processing for a DASD device an IO operation is started to
determine the format of the device. CDL format contains specifically
sized blocks at the beginning of the disk.

For a PPRC secondary device no real IO operation is possible therefore
this IO request can not be started and this step is skipped for online
processing of secondary devices. This is generally fine since the
secondary is a copy of the primary device.

In case of an additional partition detection that is run after a swap
operation the format information is needed to properly drive partition
detection IO.

Currently the information is not passed leading to IO errors during
partition detection and a wrongly detected partition table which in turn
might lead to data corruption on the disk with the wrong partition table.

Fix by passing the format information from primary to secondary device.

Fixes: 413862c ("s390/dasd: add copy pair swap capability")
Cc: [email protected] #6.1
Reviewed-by: Jan Hoeppner <[email protected]>
Acked-by: Eduard Shishkin <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 11, 2026
A malicious user program can request large user-memory pinning via
ioctl with a large metadata_len. However, this does not guarantee
that all requested memory will be pinned. Pinning may partially succeed
and return the number of bytes that were actually pinned, which may not
match the requested size. In this case, only the addresses of the
pinned pages are valid.

The current implementation does not handle partial pinning and
incorrectly assumes that all pages in the range [0, nr_vecs) are valid.
This can lead to a null-pointer dereference because pages[n] may refer
to an unpinned memory range.

To fix this, add a check to verify that all requested pages are
successfully pinned. Pinning all pages is required to copy user data.

KASAN splat:

Syzkaller hit 'general protection fault in bio_integrity_map_user' bug.

nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: 2/0/0 default/read/poll queues
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 280 Comm: syz-executor294 Not tainted 6.11.0-dirty #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:_compound_head home/wukong/fuzznvme/linux/./include/linux/page-flags.h:240 [inline]
RIP: 0010:bvec_from_pages home/wukong/fuzznvme/linux/block/bio-integrity.c:290 [inline]
RIP: 0010:bio_integrity_map_user+0x5a3/0x11e0 home/wukong/fuzznvme/linux/block/bio-integrity.c:345
Code: 4c 89 e0 48 c1 e8 03 80 3c 30 00 0f 85 4b 0a 00 00 48 be 00 00 00 00 00 fc ff df 49 8b 1c 24 48 8d 7b 08 48 89 f8 48 c1 e8 03 <80> 3c 30 00 0f 85 35 0a 00 00 48 8b 43 08 31 ff 49 89 c5 48 89 44
RSP: 0018:ffffc900010cf4f0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000f761
RDX: ffff888006cae600 RSI: dffffc0000000000 RDI: 0000000000000008
RBP: ffffc900010cf7d0 R08: ffff888006cae600 R09: ffffed1000e0db95
R10: ffff88800706dcaf R11: ffff888006a31a00 R12: ffff888006a31a08
R13: 0000000000000740 R14: ffff888006a31a00 R15: 0000000000000001
FS:  0000555587e483c0(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002002f8c0 CR3: 000000000719a000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 nvme_map_user_request+0x4b6/0x5e0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:149
 nvme_submit_user_cmd+0x2e8/0x3c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:185
 nvme_user_cmd.constprop.0+0x35b/0x540 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:325
 nvme_ns_ioctl+0x11e/0x1c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:570
 nvme_ioctl+0x147/0x1d0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:605
 blkdev_ioctl+0x28c/0x6c0 home/wukong/fuzznvme/linux/block/ioctl.c:676
 vfs_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:51 [inline]
 __do_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:907 [inline]
 __se_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:893 [inline]
 __x64_sys_ioctl+0x1bc/0x230 home/wukong/fuzznvme/linux/fs/ioctl.c:893
 x64_sys_call+0x1209/0x20d0 home/wukong/fuzznvme/linux/./arch/x86/include/generated/asm/syscalls_64.h:17
 do_syscall_x64 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x6f/0x110 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5b3d8e98bd
Code: c3 e8 a7 1f 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd1c0e988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f5b3d8e98bd
RDX: 000000002003f680 RSI: 00000000c0484e43 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007f5b3d93eb4d R09: 00007f5b3d93eb4d
R10: 00007f5b3d93eb4d R11: 0000000000000246 R12: 0000000000000001
R13: 00007fffd1c0ebe8 R14: 00007fffd1c0e9b0 R15: 00007fffd1c0e9a0
 </TASK>
Modules linked in:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#2] PREEMPT SMP KASAN PTI

Fixes: 492c5d4 (block: bio-integrity: directly map user buffers)
Acked-by: Chao Shi <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Acked-by: Dave Tian <[email protected]>
Signed-off-by: Sungwoo Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 11, 2026
Quiesce and resume is a mechanism to suspend operations on DASD devices.
In the context of a controlled copy pair swap operation, the quiesce
operation is usually issued before the actual swap and a resume
afterwards.

During the swap operation, the underlying device is exchanged. Therefore,
the quiesce flag must be moved to the secondary device to ensure a
consistent quiesce state after the swap.

The secondary device itself cannot be suspended separately because there
is no separate block device representation for it.

Fixes: 413862c ("s390/dasd: add copy pair swap capability")
Cc: [email protected] #6.1
Reviewed-by: Jan Hoeppner <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 11, 2026
During online processing for a DASD device an IO operation is started to
determine the format of the device. CDL format contains specifically
sized blocks at the beginning of the disk.

For a PPRC secondary device no real IO operation is possible therefore
this IO request can not be started and this step is skipped for online
processing of secondary devices. This is generally fine since the
secondary is a copy of the primary device.

In case of an additional partition detection that is run after a swap
operation the format information is needed to properly drive partition
detection IO.

Currently the information is not passed leading to IO errors during
partition detection and a wrongly detected partition table which in turn
might lead to data corruption on the disk with the wrong partition table.

Fix by passing the format information from primary to secondary device.

Fixes: 413862c ("s390/dasd: add copy pair swap capability")
Cc: [email protected] #6.1
Reviewed-by: Jan Hoeppner <[email protected]>
Acked-by: Eduard Shishkin <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 12, 2026
A malicious user program can request large user-memory pinning via
ioctl with a large metadata_len. However, this does not guarantee
that all requested memory will be pinned. Pinning may partially succeed
and return the number of bytes that were actually pinned, which may not
match the requested size. In this case, only the addresses of the
pinned pages are valid.

The current implementation does not handle partial pinning and
incorrectly assumes that all pages in the range [0, nr_vecs) are valid.
This can lead to a null-pointer dereference because pages[n] may refer
to an unpinned memory range.

To fix this, add a check to verify that all requested pages are
successfully pinned. Pinning all pages is required to copy user data.

KASAN splat:

Syzkaller hit 'general protection fault in bio_integrity_map_user' bug.

nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: 2/0/0 default/read/poll queues
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 280 Comm: syz-executor294 Not tainted 6.11.0-dirty #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:_compound_head home/wukong/fuzznvme/linux/./include/linux/page-flags.h:240 [inline]
RIP: 0010:bvec_from_pages home/wukong/fuzznvme/linux/block/bio-integrity.c:290 [inline]
RIP: 0010:bio_integrity_map_user+0x5a3/0x11e0 home/wukong/fuzznvme/linux/block/bio-integrity.c:345
Code: 4c 89 e0 48 c1 e8 03 80 3c 30 00 0f 85 4b 0a 00 00 48 be 00 00 00 00 00 fc ff df 49 8b 1c 24 48 8d 7b 08 48 89 f8 48 c1 e8 03 <80> 3c 30 00 0f 85 35 0a 00 00 48 8b 43 08 31 ff 49 89 c5 48 89 44
RSP: 0018:ffffc900010cf4f0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000f761
RDX: ffff888006cae600 RSI: dffffc0000000000 RDI: 0000000000000008
RBP: ffffc900010cf7d0 R08: ffff888006cae600 R09: ffffed1000e0db95
R10: ffff88800706dcaf R11: ffff888006a31a00 R12: ffff888006a31a08
R13: 0000000000000740 R14: ffff888006a31a00 R15: 0000000000000001
FS:  0000555587e483c0(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002002f8c0 CR3: 000000000719a000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 nvme_map_user_request+0x4b6/0x5e0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:149
 nvme_submit_user_cmd+0x2e8/0x3c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:185
 nvme_user_cmd.constprop.0+0x35b/0x540 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:325
 nvme_ns_ioctl+0x11e/0x1c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:570
 nvme_ioctl+0x147/0x1d0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:605
 blkdev_ioctl+0x28c/0x6c0 home/wukong/fuzznvme/linux/block/ioctl.c:676
 vfs_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:51 [inline]
 __do_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:907 [inline]
 __se_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:893 [inline]
 __x64_sys_ioctl+0x1bc/0x230 home/wukong/fuzznvme/linux/fs/ioctl.c:893
 x64_sys_call+0x1209/0x20d0 home/wukong/fuzznvme/linux/./arch/x86/include/generated/asm/syscalls_64.h:17
 do_syscall_x64 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x6f/0x110 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5b3d8e98bd
Code: c3 e8 a7 1f 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd1c0e988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f5b3d8e98bd
RDX: 000000002003f680 RSI: 00000000c0484e43 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007f5b3d93eb4d R09: 00007f5b3d93eb4d
R10: 00007f5b3d93eb4d R11: 0000000000000246 R12: 0000000000000001
R13: 00007fffd1c0ebe8 R14: 00007fffd1c0e9b0 R15: 00007fffd1c0e9a0
 </TASK>
Modules linked in:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#2] PREEMPT SMP KASAN PTI

Fixes: 492c5d4 (block: bio-integrity: directly map user buffers)
Acked-by: Chao Shi <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Acked-by: Dave Tian <[email protected]>
Signed-off-by: Sungwoo Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 12, 2026
Quiesce and resume is a mechanism to suspend operations on DASD devices.
In the context of a controlled copy pair swap operation, the quiesce
operation is usually issued before the actual swap and a resume
afterwards.

During the swap operation, the underlying device is exchanged. Therefore,
the quiesce flag must be moved to the secondary device to ensure a
consistent quiesce state after the swap.

The secondary device itself cannot be suspended separately because there
is no separate block device representation for it.

Fixes: 413862c ("s390/dasd: add copy pair swap capability")
Cc: [email protected] #6.1
Reviewed-by: Jan Hoeppner <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 12, 2026
During online processing for a DASD device an IO operation is started to
determine the format of the device. CDL format contains specifically
sized blocks at the beginning of the disk.

For a PPRC secondary device no real IO operation is possible therefore
this IO request can not be started and this step is skipped for online
processing of secondary devices. This is generally fine since the
secondary is a copy of the primary device.

In case of an additional partition detection that is run after a swap
operation the format information is needed to properly drive partition
detection IO.

Currently the information is not passed leading to IO errors during
partition detection and a wrongly detected partition table which in turn
might lead to data corruption on the disk with the wrong partition table.

Fix by passing the format information from primary to secondary device.

Fixes: 413862c ("s390/dasd: add copy pair swap capability")
Cc: [email protected] #6.1
Reviewed-by: Jan Hoeppner <[email protected]>
Acked-by: Eduard Shishkin <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 13, 2026
A malicious user program can request large user-memory pinning via
ioctl with a large metadata_len. However, this does not guarantee
that all requested memory will be pinned. Pinning may partially succeed
and return the number of bytes that were actually pinned, which may not
match the requested size. In this case, only the addresses of the
pinned pages are valid.

The current implementation does not handle partial pinning and
incorrectly assumes that all pages in the range [0, nr_vecs) are valid.
This can lead to a null-pointer dereference because pages[n] may refer
to an unpinned memory range.

To fix this, add a check to verify that all requested pages are
successfully pinned. Pinning all pages is required to copy user data.

KASAN splat:

Syzkaller hit 'general protection fault in bio_integrity_map_user' bug.

nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: 2/0/0 default/read/poll queues
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 280 Comm: syz-executor294 Not tainted 6.11.0-dirty #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:_compound_head home/wukong/fuzznvme/linux/./include/linux/page-flags.h:240 [inline]
RIP: 0010:bvec_from_pages home/wukong/fuzznvme/linux/block/bio-integrity.c:290 [inline]
RIP: 0010:bio_integrity_map_user+0x5a3/0x11e0 home/wukong/fuzznvme/linux/block/bio-integrity.c:345
Code: 4c 89 e0 48 c1 e8 03 80 3c 30 00 0f 85 4b 0a 00 00 48 be 00 00 00 00 00 fc ff df 49 8b 1c 24 48 8d 7b 08 48 89 f8 48 c1 e8 03 <80> 3c 30 00 0f 85 35 0a 00 00 48 8b 43 08 31 ff 49 89 c5 48 89 44
RSP: 0018:ffffc900010cf4f0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000f761
RDX: ffff888006cae600 RSI: dffffc0000000000 RDI: 0000000000000008
RBP: ffffc900010cf7d0 R08: ffff888006cae600 R09: ffffed1000e0db95
R10: ffff88800706dcaf R11: ffff888006a31a00 R12: ffff888006a31a08
R13: 0000000000000740 R14: ffff888006a31a00 R15: 0000000000000001
FS:  0000555587e483c0(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002002f8c0 CR3: 000000000719a000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 nvme_map_user_request+0x4b6/0x5e0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:149
 nvme_submit_user_cmd+0x2e8/0x3c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:185
 nvme_user_cmd.constprop.0+0x35b/0x540 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:325
 nvme_ns_ioctl+0x11e/0x1c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:570
 nvme_ioctl+0x147/0x1d0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:605
 blkdev_ioctl+0x28c/0x6c0 home/wukong/fuzznvme/linux/block/ioctl.c:676
 vfs_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:51 [inline]
 __do_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:907 [inline]
 __se_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:893 [inline]
 __x64_sys_ioctl+0x1bc/0x230 home/wukong/fuzznvme/linux/fs/ioctl.c:893
 x64_sys_call+0x1209/0x20d0 home/wukong/fuzznvme/linux/./arch/x86/include/generated/asm/syscalls_64.h:17
 do_syscall_x64 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x6f/0x110 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5b3d8e98bd
Code: c3 e8 a7 1f 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd1c0e988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f5b3d8e98bd
RDX: 000000002003f680 RSI: 00000000c0484e43 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007f5b3d93eb4d R09: 00007f5b3d93eb4d
R10: 00007f5b3d93eb4d R11: 0000000000000246 R12: 0000000000000001
R13: 00007fffd1c0ebe8 R14: 00007fffd1c0e9b0 R15: 00007fffd1c0e9a0
 </TASK>
Modules linked in:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#2] PREEMPT SMP KASAN PTI

Fixes: 492c5d4 (block: bio-integrity: directly map user buffers)
Acked-by: Chao Shi <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Acked-by: Dave Tian <[email protected]>
Signed-off-by: Sungwoo Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 13, 2026
Quiesce and resume is a mechanism to suspend operations on DASD devices.
In the context of a controlled copy pair swap operation, the quiesce
operation is usually issued before the actual swap and a resume
afterwards.

During the swap operation, the underlying device is exchanged. Therefore,
the quiesce flag must be moved to the secondary device to ensure a
consistent quiesce state after the swap.

The secondary device itself cannot be suspended separately because there
is no separate block device representation for it.

Fixes: 413862c ("s390/dasd: add copy pair swap capability")
Cc: [email protected] #6.1
Reviewed-by: Jan Hoeppner <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 13, 2026
During online processing for a DASD device an IO operation is started to
determine the format of the device. CDL format contains specifically
sized blocks at the beginning of the disk.

For a PPRC secondary device no real IO operation is possible therefore
this IO request can not be started and this step is skipped for online
processing of secondary devices. This is generally fine since the
secondary is a copy of the primary device.

In case of an additional partition detection that is run after a swap
operation the format information is needed to properly drive partition
detection IO.

Currently the information is not passed leading to IO errors during
partition detection and a wrongly detected partition table which in turn
might lead to data corruption on the disk with the wrong partition table.

Fix by passing the format information from primary to secondary device.

Fixes: 413862c ("s390/dasd: add copy pair swap capability")
Cc: [email protected] #6.1
Reviewed-by: Jan Hoeppner <[email protected]>
Acked-by: Eduard Shishkin <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 15, 2026
A malicious user program can request large user-memory pinning via
ioctl with a large metadata_len. However, this does not guarantee
that all requested memory will be pinned. Pinning may partially succeed
and return the number of bytes that were actually pinned, which may not
match the requested size. In this case, only the addresses of the
pinned pages are valid.

The current implementation does not handle partial pinning and
incorrectly assumes that all pages in the range [0, nr_vecs) are valid.
This can lead to a null-pointer dereference because pages[n] may refer
to an unpinned memory range.

To fix this, add a check to verify that all requested pages are
successfully pinned. Pinning all pages is required to copy user data.

KASAN splat:

Syzkaller hit 'general protection fault in bio_integrity_map_user' bug.

nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: 2/0/0 default/read/poll queues
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 280 Comm: syz-executor294 Not tainted 6.11.0-dirty #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:_compound_head home/wukong/fuzznvme/linux/./include/linux/page-flags.h:240 [inline]
RIP: 0010:bvec_from_pages home/wukong/fuzznvme/linux/block/bio-integrity.c:290 [inline]
RIP: 0010:bio_integrity_map_user+0x5a3/0x11e0 home/wukong/fuzznvme/linux/block/bio-integrity.c:345
Code: 4c 89 e0 48 c1 e8 03 80 3c 30 00 0f 85 4b 0a 00 00 48 be 00 00 00 00 00 fc ff df 49 8b 1c 24 48 8d 7b 08 48 89 f8 48 c1 e8 03 <80> 3c 30 00 0f 85 35 0a 00 00 48 8b 43 08 31 ff 49 89 c5 48 89 44
RSP: 0018:ffffc900010cf4f0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000f761
RDX: ffff888006cae600 RSI: dffffc0000000000 RDI: 0000000000000008
RBP: ffffc900010cf7d0 R08: ffff888006cae600 R09: ffffed1000e0db95
R10: ffff88800706dcaf R11: ffff888006a31a00 R12: ffff888006a31a08
R13: 0000000000000740 R14: ffff888006a31a00 R15: 0000000000000001
FS:  0000555587e483c0(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002002f8c0 CR3: 000000000719a000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 nvme_map_user_request+0x4b6/0x5e0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:149
 nvme_submit_user_cmd+0x2e8/0x3c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:185
 nvme_user_cmd.constprop.0+0x35b/0x540 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:325
 nvme_ns_ioctl+0x11e/0x1c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:570
 nvme_ioctl+0x147/0x1d0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:605
 blkdev_ioctl+0x28c/0x6c0 home/wukong/fuzznvme/linux/block/ioctl.c:676
 vfs_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:51 [inline]
 __do_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:907 [inline]
 __se_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:893 [inline]
 __x64_sys_ioctl+0x1bc/0x230 home/wukong/fuzznvme/linux/fs/ioctl.c:893
 x64_sys_call+0x1209/0x20d0 home/wukong/fuzznvme/linux/./arch/x86/include/generated/asm/syscalls_64.h:17
 do_syscall_x64 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x6f/0x110 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5b3d8e98bd
Code: c3 e8 a7 1f 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd1c0e988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f5b3d8e98bd
RDX: 000000002003f680 RSI: 00000000c0484e43 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007f5b3d93eb4d R09: 00007f5b3d93eb4d
R10: 00007f5b3d93eb4d R11: 0000000000000246 R12: 0000000000000001
R13: 00007fffd1c0ebe8 R14: 00007fffd1c0e9b0 R15: 00007fffd1c0e9a0
 </TASK>
Modules linked in:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#2] PREEMPT SMP KASAN PTI

Fixes: 492c5d4 (block: bio-integrity: directly map user buffers)
Acked-by: Chao Shi <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Acked-by: Dave Tian <[email protected]>
Signed-off-by: Sungwoo Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 15, 2026
Quiesce and resume is a mechanism to suspend operations on DASD devices.
In the context of a controlled copy pair swap operation, the quiesce
operation is usually issued before the actual swap and a resume
afterwards.

During the swap operation, the underlying device is exchanged. Therefore,
the quiesce flag must be moved to the secondary device to ensure a
consistent quiesce state after the swap.

The secondary device itself cannot be suspended separately because there
is no separate block device representation for it.

Fixes: 413862c ("s390/dasd: add copy pair swap capability")
Cc: [email protected] #6.1
Reviewed-by: Jan Hoeppner <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 15, 2026
During online processing for a DASD device an IO operation is started to
determine the format of the device. CDL format contains specifically
sized blocks at the beginning of the disk.

For a PPRC secondary device no real IO operation is possible therefore
this IO request can not be started and this step is skipped for online
processing of secondary devices. This is generally fine since the
secondary is a copy of the primary device.

In case of an additional partition detection that is run after a swap
operation the format information is needed to properly drive partition
detection IO.

Currently the information is not passed leading to IO errors during
partition detection and a wrongly detected partition table which in turn
might lead to data corruption on the disk with the wrong partition table.

Fix by passing the format information from primary to secondary device.

Fixes: 413862c ("s390/dasd: add copy pair swap capability")
Cc: [email protected] #6.1
Reviewed-by: Jan Hoeppner <[email protected]>
Acked-by: Eduard Shishkin <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 18, 2026
SMB2_write() places write payload in iov[1..n] as part of rq_iov.
smb3_init_transform_rq() pointer-shares rq_iov, so crypt_message()
encrypts iov[1] in-place, replacing the original plaintext with
ciphertext. On a replayable error, the retry sends the same iov[1]
which now contains ciphertext instead of the original data,
resulting in corruption.

The corruption is most likely to be observed when connections are
unstable, as reconnects trigger write retries that re-send the
already-encrypted data.

This affects SFU mknod, MF symlinks, etc. On kernels before
6.10 (prior to the netfs conversion), sync writes also used
this path and were similarly affected. The async write path
wasn't unaffected as it uses rq_iter which gets deep-copied.

Fix by moving the write payload into rq_iter via iov_iter_kvec(),
so smb3_init_transform_rq() deep-copies it before encryption.

Cc: [email protected] #6.3+
Acked-by: Henrique Carvalho <[email protected]>
Acked-by: Shyam Prasad N <[email protected]>
Acked-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: Bharath SM <[email protected]>
Signed-off-by: Steve French <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 18, 2026
A malicious user program can request large user-memory pinning via
ioctl with a large metadata_len. However, this does not guarantee
that all requested memory will be pinned. Pinning may partially succeed
and return the number of bytes that were actually pinned, which may not
match the requested size. In this case, only the addresses of the
pinned pages are valid.

The current implementation does not handle partial pinning and
incorrectly assumes that all pages in the range [0, nr_vecs) are valid.
This can lead to a null-pointer dereference because pages[n] may refer
to an unpinned memory range.

To fix this, add a check to verify that all requested pages are
successfully pinned. Pinning all pages is required to copy user data.

KASAN splat:

Syzkaller hit 'general protection fault in bio_integrity_map_user' bug.

nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: 2/0/0 default/read/poll queues
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 280 Comm: syz-executor294 Not tainted 6.11.0-dirty #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:_compound_head home/wukong/fuzznvme/linux/./include/linux/page-flags.h:240 [inline]
RIP: 0010:bvec_from_pages home/wukong/fuzznvme/linux/block/bio-integrity.c:290 [inline]
RIP: 0010:bio_integrity_map_user+0x5a3/0x11e0 home/wukong/fuzznvme/linux/block/bio-integrity.c:345
Code: 4c 89 e0 48 c1 e8 03 80 3c 30 00 0f 85 4b 0a 00 00 48 be 00 00 00 00 00 fc ff df 49 8b 1c 24 48 8d 7b 08 48 89 f8 48 c1 e8 03 <80> 3c 30 00 0f 85 35 0a 00 00 48 8b 43 08 31 ff 49 89 c5 48 89 44
RSP: 0018:ffffc900010cf4f0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000f761
RDX: ffff888006cae600 RSI: dffffc0000000000 RDI: 0000000000000008
RBP: ffffc900010cf7d0 R08: ffff888006cae600 R09: ffffed1000e0db95
R10: ffff88800706dcaf R11: ffff888006a31a00 R12: ffff888006a31a08
R13: 0000000000000740 R14: ffff888006a31a00 R15: 0000000000000001
FS:  0000555587e483c0(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002002f8c0 CR3: 000000000719a000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 nvme_map_user_request+0x4b6/0x5e0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:149
 nvme_submit_user_cmd+0x2e8/0x3c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:185
 nvme_user_cmd.constprop.0+0x35b/0x540 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:325
 nvme_ns_ioctl+0x11e/0x1c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:570
 nvme_ioctl+0x147/0x1d0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:605
 blkdev_ioctl+0x28c/0x6c0 home/wukong/fuzznvme/linux/block/ioctl.c:676
 vfs_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:51 [inline]
 __do_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:907 [inline]
 __se_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:893 [inline]
 __x64_sys_ioctl+0x1bc/0x230 home/wukong/fuzznvme/linux/fs/ioctl.c:893
 x64_sys_call+0x1209/0x20d0 home/wukong/fuzznvme/linux/./arch/x86/include/generated/asm/syscalls_64.h:17
 do_syscall_x64 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x6f/0x110 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5b3d8e98bd
Code: c3 e8 a7 1f 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd1c0e988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f5b3d8e98bd
RDX: 000000002003f680 RSI: 00000000c0484e43 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007f5b3d93eb4d R09: 00007f5b3d93eb4d
R10: 00007f5b3d93eb4d R11: 0000000000000246 R12: 0000000000000001
R13: 00007fffd1c0ebe8 R14: 00007fffd1c0e9b0 R15: 00007fffd1c0e9a0
 </TASK>
Modules linked in:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#2] PREEMPT SMP KASAN PTI

Fixes: 492c5d4 (block: bio-integrity: directly map user buffers)
Acked-by: Chao Shi <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Acked-by: Dave Tian <[email protected]>
Signed-off-by: Sungwoo Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 18, 2026
A malicious user program can request large user-memory pinning via
ioctl with a large metadata_len. However, this does not guarantee
that all requested memory will be pinned. Pinning may partially succeed
and return the number of bytes that were actually pinned, which may not
match the requested size. In this case, only the addresses of the
pinned pages are valid.

The current implementation does not handle partial pinning and
incorrectly assumes that all pages in the range [0, nr_vecs) are valid.
This can lead to a null-pointer dereference because pages[n] may refer
to an unpinned memory range.

To fix this, add a check to verify that all requested pages are
successfully pinned. Pinning all pages is required to copy user data.

KASAN splat:

Syzkaller hit 'general protection fault in bio_integrity_map_user' bug.

nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: 2/0/0 default/read/poll queues
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 280 Comm: syz-executor294 Not tainted 6.11.0-dirty #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:_compound_head home/wukong/fuzznvme/linux/./include/linux/page-flags.h:240 [inline]
RIP: 0010:bvec_from_pages home/wukong/fuzznvme/linux/block/bio-integrity.c:290 [inline]
RIP: 0010:bio_integrity_map_user+0x5a3/0x11e0 home/wukong/fuzznvme/linux/block/bio-integrity.c:345
Code: 4c 89 e0 48 c1 e8 03 80 3c 30 00 0f 85 4b 0a 00 00 48 be 00 00 00 00 00 fc ff df 49 8b 1c 24 48 8d 7b 08 48 89 f8 48 c1 e8 03 <80> 3c 30 00 0f 85 35 0a 00 00 48 8b 43 08 31 ff 49 89 c5 48 89 44
RSP: 0018:ffffc900010cf4f0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000f761
RDX: ffff888006cae600 RSI: dffffc0000000000 RDI: 0000000000000008
RBP: ffffc900010cf7d0 R08: ffff888006cae600 R09: ffffed1000e0db95
R10: ffff88800706dcaf R11: ffff888006a31a00 R12: ffff888006a31a08
R13: 0000000000000740 R14: ffff888006a31a00 R15: 0000000000000001
FS:  0000555587e483c0(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002002f8c0 CR3: 000000000719a000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 nvme_map_user_request+0x4b6/0x5e0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:149
 nvme_submit_user_cmd+0x2e8/0x3c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:185
 nvme_user_cmd.constprop.0+0x35b/0x540 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:325
 nvme_ns_ioctl+0x11e/0x1c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:570
 nvme_ioctl+0x147/0x1d0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:605
 blkdev_ioctl+0x28c/0x6c0 home/wukong/fuzznvme/linux/block/ioctl.c:676
 vfs_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:51 [inline]
 __do_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:907 [inline]
 __se_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:893 [inline]
 __x64_sys_ioctl+0x1bc/0x230 home/wukong/fuzznvme/linux/fs/ioctl.c:893
 x64_sys_call+0x1209/0x20d0 home/wukong/fuzznvme/linux/./arch/x86/include/generated/asm/syscalls_64.h:17
 do_syscall_x64 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x6f/0x110 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5b3d8e98bd
Code: c3 e8 a7 1f 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd1c0e988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f5b3d8e98bd
RDX: 000000002003f680 RSI: 00000000c0484e43 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007f5b3d93eb4d R09: 00007f5b3d93eb4d
R10: 00007f5b3d93eb4d R11: 0000000000000246 R12: 0000000000000001
R13: 00007fffd1c0ebe8 R14: 00007fffd1c0e9b0 R15: 00007fffd1c0e9a0
 </TASK>
Modules linked in:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#2] PREEMPT SMP KASAN PTI

Fixes: 492c5d4 (block: bio-integrity: directly map user buffers)
Acked-by: Chao Shi <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Acked-by: Dave Tian <[email protected]>
Signed-off-by: Sungwoo Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 22, 2026
A malicious user program can request large user-memory pinning via
ioctl with a large metadata_len. However, this does not guarantee
that all requested memory will be pinned. Pinning may partially succeed
and return the number of bytes that were actually pinned, which may not
match the requested size. In this case, only the addresses of the
pinned pages are valid.

The current implementation does not handle partial pinning and
incorrectly assumes that all pages in the range [0, nr_vecs) are valid.
This can lead to a null-pointer dereference because pages[n] may refer
to an unpinned memory range.

To fix this, add a check to verify that all requested pages are
successfully pinned. Pinning all pages is required to copy user data.

KASAN splat:

Syzkaller hit 'general protection fault in bio_integrity_map_user' bug.

nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: 2/0/0 default/read/poll queues
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 280 Comm: syz-executor294 Not tainted 6.11.0-dirty #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:_compound_head home/wukong/fuzznvme/linux/./include/linux/page-flags.h:240 [inline]
RIP: 0010:bvec_from_pages home/wukong/fuzznvme/linux/block/bio-integrity.c:290 [inline]
RIP: 0010:bio_integrity_map_user+0x5a3/0x11e0 home/wukong/fuzznvme/linux/block/bio-integrity.c:345
Code: 4c 89 e0 48 c1 e8 03 80 3c 30 00 0f 85 4b 0a 00 00 48 be 00 00 00 00 00 fc ff df 49 8b 1c 24 48 8d 7b 08 48 89 f8 48 c1 e8 03 <80> 3c 30 00 0f 85 35 0a 00 00 48 8b 43 08 31 ff 49 89 c5 48 89 44
RSP: 0018:ffffc900010cf4f0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000f761
RDX: ffff888006cae600 RSI: dffffc0000000000 RDI: 0000000000000008
RBP: ffffc900010cf7d0 R08: ffff888006cae600 R09: ffffed1000e0db95
R10: ffff88800706dcaf R11: ffff888006a31a00 R12: ffff888006a31a08
R13: 0000000000000740 R14: ffff888006a31a00 R15: 0000000000000001
FS:  0000555587e483c0(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002002f8c0 CR3: 000000000719a000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 nvme_map_user_request+0x4b6/0x5e0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:149
 nvme_submit_user_cmd+0x2e8/0x3c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:185
 nvme_user_cmd.constprop.0+0x35b/0x540 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:325
 nvme_ns_ioctl+0x11e/0x1c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:570
 nvme_ioctl+0x147/0x1d0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:605
 blkdev_ioctl+0x28c/0x6c0 home/wukong/fuzznvme/linux/block/ioctl.c:676
 vfs_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:51 [inline]
 __do_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:907 [inline]
 __se_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:893 [inline]
 __x64_sys_ioctl+0x1bc/0x230 home/wukong/fuzznvme/linux/fs/ioctl.c:893
 x64_sys_call+0x1209/0x20d0 home/wukong/fuzznvme/linux/./arch/x86/include/generated/asm/syscalls_64.h:17
 do_syscall_x64 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x6f/0x110 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5b3d8e98bd
Code: c3 e8 a7 1f 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd1c0e988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f5b3d8e98bd
RDX: 000000002003f680 RSI: 00000000c0484e43 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007f5b3d93eb4d R09: 00007f5b3d93eb4d
R10: 00007f5b3d93eb4d R11: 0000000000000246 R12: 0000000000000001
R13: 00007fffd1c0ebe8 R14: 00007fffd1c0e9b0 R15: 00007fffd1c0e9a0
 </TASK>
Modules linked in:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#2] PREEMPT SMP KASAN PTI

Fixes: 492c5d4 (block: bio-integrity: directly map user buffers)
Acked-by: Chao Shi <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Acked-by: Dave Tian <[email protected]>
Signed-off-by: Sungwoo Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 23, 2026
A malicious user program can request large user-memory pinning via
ioctl with a large metadata_len. However, this does not guarantee
that all requested memory will be pinned. Pinning may partially succeed
and return the number of bytes that were actually pinned, which may not
match the requested size. In this case, only the addresses of the
pinned pages are valid.

The current implementation does not handle partial pinning and
incorrectly assumes that all pages in the range [0, nr_vecs) are valid.
This can lead to a null-pointer dereference because pages[n] may refer
to an unpinned memory range.

To fix this, add a check to verify that all requested pages are
successfully pinned. Pinning all pages is required to copy user data.

KASAN splat:

Syzkaller hit 'general protection fault in bio_integrity_map_user' bug.

nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: 2/0/0 default/read/poll queues
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 280 Comm: syz-executor294 Not tainted 6.11.0-dirty #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:_compound_head home/wukong/fuzznvme/linux/./include/linux/page-flags.h:240 [inline]
RIP: 0010:bvec_from_pages home/wukong/fuzznvme/linux/block/bio-integrity.c:290 [inline]
RIP: 0010:bio_integrity_map_user+0x5a3/0x11e0 home/wukong/fuzznvme/linux/block/bio-integrity.c:345
Code: 4c 89 e0 48 c1 e8 03 80 3c 30 00 0f 85 4b 0a 00 00 48 be 00 00 00 00 00 fc ff df 49 8b 1c 24 48 8d 7b 08 48 89 f8 48 c1 e8 03 <80> 3c 30 00 0f 85 35 0a 00 00 48 8b 43 08 31 ff 49 89 c5 48 89 44
RSP: 0018:ffffc900010cf4f0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000f761
RDX: ffff888006cae600 RSI: dffffc0000000000 RDI: 0000000000000008
RBP: ffffc900010cf7d0 R08: ffff888006cae600 R09: ffffed1000e0db95
R10: ffff88800706dcaf R11: ffff888006a31a00 R12: ffff888006a31a08
R13: 0000000000000740 R14: ffff888006a31a00 R15: 0000000000000001
FS:  0000555587e483c0(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002002f8c0 CR3: 000000000719a000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 nvme_map_user_request+0x4b6/0x5e0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:149
 nvme_submit_user_cmd+0x2e8/0x3c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:185
 nvme_user_cmd.constprop.0+0x35b/0x540 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:325
 nvme_ns_ioctl+0x11e/0x1c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:570
 nvme_ioctl+0x147/0x1d0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:605
 blkdev_ioctl+0x28c/0x6c0 home/wukong/fuzznvme/linux/block/ioctl.c:676
 vfs_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:51 [inline]
 __do_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:907 [inline]
 __se_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:893 [inline]
 __x64_sys_ioctl+0x1bc/0x230 home/wukong/fuzznvme/linux/fs/ioctl.c:893
 x64_sys_call+0x1209/0x20d0 home/wukong/fuzznvme/linux/./arch/x86/include/generated/asm/syscalls_64.h:17
 do_syscall_x64 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x6f/0x110 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5b3d8e98bd
Code: c3 e8 a7 1f 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd1c0e988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f5b3d8e98bd
RDX: 000000002003f680 RSI: 00000000c0484e43 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007f5b3d93eb4d R09: 00007f5b3d93eb4d
R10: 00007f5b3d93eb4d R11: 0000000000000246 R12: 0000000000000001
R13: 00007fffd1c0ebe8 R14: 00007fffd1c0e9b0 R15: 00007fffd1c0e9a0
 </TASK>
Modules linked in:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#2] PREEMPT SMP KASAN PTI

Fixes: 492c5d4 (block: bio-integrity: directly map user buffers)
Acked-by: Chao Shi <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Acked-by: Dave Tian <[email protected]>
Signed-off-by: Sungwoo Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 24, 2026
A malicious user program can request large user-memory pinning via
ioctl with a large metadata_len. However, this does not guarantee
that all requested memory will be pinned. Pinning may partially succeed
and return the number of bytes that were actually pinned, which may not
match the requested size. In this case, only the addresses of the
pinned pages are valid.

The current implementation does not handle partial pinning and
incorrectly assumes that all pages in the range [0, nr_vecs) are valid.
This can lead to a null-pointer dereference because pages[n] may refer
to an unpinned memory range.

To fix this, add a check to verify that all requested pages are
successfully pinned. Pinning all pages is required to copy user data.

KASAN splat:

Syzkaller hit 'general protection fault in bio_integrity_map_user' bug.

nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: 2/0/0 default/read/poll queues
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 280 Comm: syz-executor294 Not tainted 6.11.0-dirty #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:_compound_head home/wukong/fuzznvme/linux/./include/linux/page-flags.h:240 [inline]
RIP: 0010:bvec_from_pages home/wukong/fuzznvme/linux/block/bio-integrity.c:290 [inline]
RIP: 0010:bio_integrity_map_user+0x5a3/0x11e0 home/wukong/fuzznvme/linux/block/bio-integrity.c:345
Code: 4c 89 e0 48 c1 e8 03 80 3c 30 00 0f 85 4b 0a 00 00 48 be 00 00 00 00 00 fc ff df 49 8b 1c 24 48 8d 7b 08 48 89 f8 48 c1 e8 03 <80> 3c 30 00 0f 85 35 0a 00 00 48 8b 43 08 31 ff 49 89 c5 48 89 44
RSP: 0018:ffffc900010cf4f0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000f761
RDX: ffff888006cae600 RSI: dffffc0000000000 RDI: 0000000000000008
RBP: ffffc900010cf7d0 R08: ffff888006cae600 R09: ffffed1000e0db95
R10: ffff88800706dcaf R11: ffff888006a31a00 R12: ffff888006a31a08
R13: 0000000000000740 R14: ffff888006a31a00 R15: 0000000000000001
FS:  0000555587e483c0(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002002f8c0 CR3: 000000000719a000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 nvme_map_user_request+0x4b6/0x5e0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:149
 nvme_submit_user_cmd+0x2e8/0x3c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:185
 nvme_user_cmd.constprop.0+0x35b/0x540 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:325
 nvme_ns_ioctl+0x11e/0x1c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:570
 nvme_ioctl+0x147/0x1d0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:605
 blkdev_ioctl+0x28c/0x6c0 home/wukong/fuzznvme/linux/block/ioctl.c:676
 vfs_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:51 [inline]
 __do_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:907 [inline]
 __se_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:893 [inline]
 __x64_sys_ioctl+0x1bc/0x230 home/wukong/fuzznvme/linux/fs/ioctl.c:893
 x64_sys_call+0x1209/0x20d0 home/wukong/fuzznvme/linux/./arch/x86/include/generated/asm/syscalls_64.h:17
 do_syscall_x64 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x6f/0x110 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5b3d8e98bd
Code: c3 e8 a7 1f 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd1c0e988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f5b3d8e98bd
RDX: 000000002003f680 RSI: 00000000c0484e43 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007f5b3d93eb4d R09: 00007f5b3d93eb4d
R10: 00007f5b3d93eb4d R11: 0000000000000246 R12: 0000000000000001
R13: 00007fffd1c0ebe8 R14: 00007fffd1c0e9b0 R15: 00007fffd1c0e9a0
 </TASK>
Modules linked in:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#2] PREEMPT SMP KASAN PTI

Fixes: 492c5d4 (block: bio-integrity: directly map user buffers)
Acked-by: Chao Shi <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Acked-by: Dave Tian <[email protected]>
Signed-off-by: Sungwoo Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 25, 2026
As reported by syzbot [0], NBD can trigger a deadlock during
memory reclaim.

This occurs when a process holds lock_sock() on a backend TCP
socket and triggers a memory allocation that leads to fs reclaim.
If it eventually calls into NBD to send data or shut down the
socket, NBD will attempt to acquire the same lock_sock(),
resulting in the deadlock.

While NBD sets sk->sk_allocation to GFP_NOIO before calling
sendmsg(), this does not prevent the issue in some paths where
GFP_KERNEL is used directly under lock_sock().

To resolve this, let's use lock_sock_try() for TCP sendmsg() and
shutdown().

For sock_sendmsg(), if lock_sock_try() fails, -ERESTARTSYS is
returned, allowing the request to be retried later (e.g., via
was_interrupted() logic).

For sock_sendmsg() for NBD_CMD_DISC and kernel_sock_shutdown(),
the operation might be skipped if the lock cannot be acquired.
However, this is not expected to occur in practice because the
backend TCP socket should not be touched by userspace once it is
handed over to NBD.

Note that sock_recvmsg() does not require this special handling
because it is only called from the workqueue context.

Also note that AF_UNIX sockets continue to use sock_sendmsg()
and kernel_sock_shutdown() because unix_stream_sendmsg() and
unix_shutdown() do not acquire lock_sock().

[0]:
WARNING: possible circular locking dependency detected
syzkaller #0 Tainted: G             L
syz.7.2282/12353 is trying to acquire lock:
ffffffff8e9aa700 (fs_reclaim){+.+.}-{0:0}, at: might_alloc include/linux/sched/mm.h:317 [inline]
ffffffff8e9aa700 (fs_reclaim){+.+.}-{0:0}, at: slab_pre_alloc_hook mm/slub.c:4489 [inline]
ffffffff8e9aa700 (fs_reclaim){+.+.}-{0:0}, at: slab_alloc_node mm/slub.c:4843 [inline]
ffffffff8e9aa700 (fs_reclaim){+.+.}-{0:0}, at: kmem_cache_alloc_node_noprof+0x53/0x6f0 mm/slub.c:4918

but task is already holding lock:
ffff88806f972a20 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1709 [inline]
ffff88806f972a20 (sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_close+0x1d/0x110 net/ipv4/tcp.c:3349

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (sk_lock-AF_INET6){+.+.}-{0:0}:
       lock_sock_nested+0x41/0xf0 net/core/sock.c:3780
       lock_sock include/net/sock.h:1709 [inline]
       inet_shutdown+0x67/0x410 net/ipv4/af_inet.c:919
       nbd_mark_nsock_dead+0xae/0x5c0 drivers/block/nbd.c:318
       sock_shutdown+0x16b/0x200 drivers/block/nbd.c:411
       nbd_clear_sock drivers/block/nbd.c:1427 [inline]
       nbd_config_put+0x1eb/0x750 drivers/block/nbd.c:1451
       nbd_genl_connect+0xaf8/0x1a40 drivers/block/nbd.c:2248
       genl_family_rcv_msg_doit+0x214/0x300 net/netlink/genetlink.c:1114
       genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
       genl_rcv_msg+0x560/0x800 net/netlink/genetlink.c:1209
       netlink_rcv_skb+0x159/0x420 net/netlink/af_netlink.c:2550
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
       netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
       netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344
       netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
       sock_sendmsg_nosec net/socket.c:727 [inline]
       __sock_sendmsg net/socket.c:742 [inline]
       ____sys_sendmsg+0x9e1/0xb70 net/socket.c:2592
       ___sys_sendmsg+0x190/0x1e0 net/socket.c:2646
       __sys_sendmsg+0x170/0x220 net/socket.c:2678
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #5 (&nsock->tx_lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x1a2/0x1b90 kernel/locking/mutex.c:776
       nbd_handle_cmd drivers/block/nbd.c:1143 [inline]
       nbd_queue_rq+0x428/0x1080 drivers/block/nbd.c:1207
       blk_mq_dispatch_rq_list+0x422/0x1e70 block/blk-mq.c:2148
       __blk_mq_do_dispatch_sched block/blk-mq-sched.c:168 [inline]
       blk_mq_do_dispatch_sched block/blk-mq-sched.c:182 [inline]
       __blk_mq_sched_dispatch_requests+0xcea/0x1620 block/blk-mq-sched.c:307
       blk_mq_sched_dispatch_requests+0xd7/0x1c0 block/blk-mq-sched.c:329
       blk_mq_run_hw_queue+0x23c/0x670 block/blk-mq.c:2386
       blk_mq_dispatch_list+0x51d/0x1360 block/blk-mq.c:2949
       blk_mq_flush_plug_list block/blk-mq.c:2997 [inline]
       blk_mq_flush_plug_list+0x130/0x600 block/blk-mq.c:2969
       __blk_flush_plug+0x2c4/0x4b0 block/blk-core.c:1230
       blk_finish_plug block/blk-core.c:1257 [inline]
       __submit_bio+0x584/0x6c0 block/blk-core.c:649
       __submit_bio_noacct_mq block/blk-core.c:722 [inline]
       submit_bio_noacct_nocheck+0x562/0xc10 block/blk-core.c:753
       submit_bio_noacct+0xd17/0x2010 block/blk-core.c:884
       blk_crypto_submit_bio include/linux/blk-crypto.h:203 [inline]
       submit_bh_wbc+0x59c/0x770 fs/buffer.c:2821
       submit_bh fs/buffer.c:2826 [inline]
       block_read_full_folio+0x264/0x8e0 fs/buffer.c:2444
       filemap_read_folio+0xfc/0x3b0 mm/filemap.c:2501
       do_read_cache_folio+0x2d7/0x6b0 mm/filemap.c:4101
       read_mapping_folio include/linux/pagemap.h:1028 [inline]
       read_part_sector+0xd1/0x370 block/partitions/core.c:723
       adfspart_check_ICS+0x93/0x910 block/partitions/acorn.c:360
       check_partition block/partitions/core.c:142 [inline]
       blk_add_partitions block/partitions/core.c:590 [inline]
       bdev_disk_changed+0x7f8/0xc80 block/partitions/core.c:694
       blkdev_get_whole+0x187/0x290 block/bdev.c:764
       bdev_open+0x2c7/0xe40 block/bdev.c:973
       blkdev_open+0x34e/0x4f0 block/fops.c:697
       do_dentry_open+0x6d8/0x1660 fs/open.c:949
       vfs_open+0x82/0x3f0 fs/open.c:1081
       do_open fs/namei.c:4671 [inline]
       path_openat+0x208c/0x31a0 fs/namei.c:4830
       do_file_open+0x20e/0x430 fs/namei.c:4859
       do_sys_openat2+0x10d/0x1e0 fs/open.c:1366
       do_sys_open fs/open.c:1372 [inline]
       __do_sys_openat fs/open.c:1388 [inline]
       __se_sys_openat fs/open.c:1383 [inline]
       __x64_sys_openat+0x12d/0x210 fs/open.c:1383
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #4 (&cmd->lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x1a2/0x1b90 kernel/locking/mutex.c:776
       nbd_queue_rq+0xba/0x1080 drivers/block/nbd.c:1199
       blk_mq_dispatch_rq_list+0x422/0x1e70 block/blk-mq.c:2148
       __blk_mq_do_dispatch_sched block/blk-mq-sched.c:168 [inline]
       blk_mq_do_dispatch_sched block/blk-mq-sched.c:182 [inline]
       __blk_mq_sched_dispatch_requests+0xcea/0x1620 block/blk-mq-sched.c:307
       blk_mq_sched_dispatch_requests+0xd7/0x1c0 block/blk-mq-sched.c:329
       blk_mq_run_hw_queue+0x23c/0x670 block/blk-mq.c:2386
       blk_mq_dispatch_list+0x51d/0x1360 block/blk-mq.c:2949
       blk_mq_flush_plug_list block/blk-mq.c:2997 [inline]
       blk_mq_flush_plug_list+0x130/0x600 block/blk-mq.c:2969
       __blk_flush_plug+0x2c4/0x4b0 block/blk-core.c:1230
       blk_finish_plug block/blk-core.c:1257 [inline]
       __submit_bio+0x584/0x6c0 block/blk-core.c:649
       __submit_bio_noacct_mq block/blk-core.c:722 [inline]
       submit_bio_noacct_nocheck+0x562/0xc10 block/blk-core.c:753
       submit_bio_noacct+0xd17/0x2010 block/blk-core.c:884
       blk_crypto_submit_bio include/linux/blk-crypto.h:203 [inline]
       submit_bh_wbc+0x59c/0x770 fs/buffer.c:2821
       submit_bh fs/buffer.c:2826 [inline]
       block_read_full_folio+0x264/0x8e0 fs/buffer.c:2444
       filemap_read_folio+0xfc/0x3b0 mm/filemap.c:2501
       do_read_cache_folio+0x2d7/0x6b0 mm/filemap.c:4101
       read_mapping_folio include/linux/pagemap.h:1028 [inline]
       read_part_sector+0xd1/0x370 block/partitions/core.c:723
       adfspart_check_ICS+0x93/0x910 block/partitions/acorn.c:360
       check_partition block/partitions/core.c:142 [inline]
       blk_add_partitions block/partitions/core.c:590 [inline]
       bdev_disk_changed+0x7f8/0xc80 block/partitions/core.c:694
       blkdev_get_whole+0x187/0x290 block/bdev.c:764
       bdev_open+0x2c7/0xe40 block/bdev.c:973
       blkdev_open+0x34e/0x4f0 block/fops.c:697
       do_dentry_open+0x6d8/0x1660 fs/open.c:949
       vfs_open+0x82/0x3f0 fs/open.c:1081
       do_open fs/namei.c:4671 [inline]
       path_openat+0x208c/0x31a0 fs/namei.c:4830
       do_file_open+0x20e/0x430 fs/namei.c:4859
       do_sys_openat2+0x10d/0x1e0 fs/open.c:1366
       do_sys_open fs/open.c:1372 [inline]
       __do_sys_openat fs/open.c:1388 [inline]
       __se_sys_openat fs/open.c:1383 [inline]
       __x64_sys_openat+0x12d/0x210 fs/open.c:1383
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #3 (set->srcu){.+.+}-{0:0}:
       srcu_lock_sync include/linux/srcu.h:199 [inline]
       __synchronize_srcu+0xa1/0x2a0 kernel/rcu/srcutree.c:1505
       blk_mq_wait_quiesce_done block/blk-mq.c:284 [inline]
       blk_mq_wait_quiesce_done block/blk-mq.c:281 [inline]
       blk_mq_quiesce_queue block/blk-mq.c:304 [inline]
       blk_mq_quiesce_queue+0x149/0x1c0 block/blk-mq.c:299
       elevator_switch+0x17b/0x7e0 block/elevator.c:576
       elevator_change+0x352/0x530 block/elevator.c:681
       elevator_set_default+0x29e/0x360 block/elevator.c:754
       blk_register_queue+0x412/0x590 block/blk-sysfs.c:946
       __add_disk+0x73f/0xe40 block/genhd.c:528
       add_disk_fwnode+0x118/0x5c0 block/genhd.c:597
       add_disk include/linux/blkdev.h:785 [inline]
       nbd_dev_add+0x77a/0xb10 drivers/block/nbd.c:1984
       nbd_init+0x291/0x2b0 drivers/block/nbd.c:2692
       do_one_initcall+0x11d/0x760 init/main.c:1382
       do_initcall_level init/main.c:1444 [inline]
       do_initcalls init/main.c:1460 [inline]
       do_basic_setup init/main.c:1479 [inline]
       kernel_init_freeable+0x6e5/0x7a0 init/main.c:1692
       kernel_init+0x1f/0x1e0 init/main.c:1582
       ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

-> #2 (&q->elevator_lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x1a2/0x1b90 kernel/locking/mutex.c:776
       elevator_change+0x1bc/0x530 block/elevator.c:679
       elevator_set_none+0x92/0xf0 block/elevator.c:769
       blk_mq_elv_switch_none block/blk-mq.c:5110 [inline]
       __blk_mq_update_nr_hw_queues block/blk-mq.c:5155 [inline]
       blk_mq_update_nr_hw_queues+0x4c1/0x15f0 block/blk-mq.c:5220
       nbd_start_device+0x1a6/0xbd0 drivers/block/nbd.c:1489
       nbd_genl_connect+0xff2/0x1a40 drivers/block/nbd.c:2239
       genl_family_rcv_msg_doit+0x214/0x300 net/netlink/genetlink.c:1114
       genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
       genl_rcv_msg+0x560/0x800 net/netlink/genetlink.c:1209
       netlink_rcv_skb+0x159/0x420 net/netlink/af_netlink.c:2550
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
       netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
       netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344
       netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
       sock_sendmsg_nosec net/socket.c:727 [inline]
       __sock_sendmsg net/socket.c:742 [inline]
       ____sys_sendmsg+0x9e1/0xb70 net/socket.c:2592
       ___sys_sendmsg+0x190/0x1e0 net/socket.c:2646
       __sys_sendmsg+0x170/0x220 net/socket.c:2678
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #1 (&q->q_usage_counter(io)#49){++++}-{0:0}:
       blk_alloc_queue+0x610/0x790 block/blk-core.c:461
       blk_mq_alloc_queue+0x174/0x290 block/blk-mq.c:4429
       __blk_mq_alloc_disk+0x29/0x120 block/blk-mq.c:4476
       nbd_dev_add+0x492/0xb10 drivers/block/nbd.c:1954
       nbd_init+0x291/0x2b0 drivers/block/nbd.c:2692
       do_one_initcall+0x11d/0x760 init/main.c:1382
       do_initcall_level init/main.c:1444 [inline]
       do_initcalls init/main.c:1460 [inline]
       do_basic_setup init/main.c:1479 [inline]
       kernel_init_freeable+0x6e5/0x7a0 init/main.c:1692
       kernel_init+0x1f/0x1e0 init/main.c:1582
       ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

-> #0 (fs_reclaim){+.+.}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3165 [inline]
       check_prevs_add kernel/locking/lockdep.c:3284 [inline]
       validate_chain kernel/locking/lockdep.c:3908 [inline]
       __lock_acquire+0x14b8/0x2630 kernel/locking/lockdep.c:5237
       lock_acquire kernel/locking/lockdep.c:5868 [inline]
       lock_acquire+0x1cf/0x380 kernel/locking/lockdep.c:5825
       __fs_reclaim_acquire mm/page_alloc.c:4348 [inline]
       fs_reclaim_acquire+0xc4/0x100 mm/page_alloc.c:4362
       might_alloc include/linux/sched/mm.h:317 [inline]
       slab_pre_alloc_hook mm/slub.c:4489 [inline]
       slab_alloc_node mm/slub.c:4843 [inline]
       kmem_cache_alloc_node_noprof+0x53/0x6f0 mm/slub.c:4918
       __alloc_skb+0x140/0x710 net/core/skbuff.c:702
       alloc_skb include/linux/skbuff.h:1383 [inline]
       tcp_send_active_reset+0x8b/0xa60 net/ipv4/tcp_output.c:3862
       __tcp_close+0x41e/0x1110 net/ipv4/tcp.c:3223
       tcp_close+0x28/0x110 net/ipv4/tcp.c:3350
       inet_release+0xed/0x200 net/ipv4/af_inet.c:443
       inet6_release+0x4f/0x70 net/ipv6/af_inet6.c:479
       __sock_release+0xb3/0x260 net/socket.c:662
       sock_close+0x1c/0x30 net/socket.c:1455
       __fput+0x3ff/0xb40 fs/file_table.c:469
       task_work_run+0x150/0x240 kernel/task_work.c:233
       resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
       __exit_to_user_mode_loop kernel/entry/common.c:67 [inline]
       exit_to_user_mode_loop+0x100/0x4a0 kernel/entry/common.c:98
       __exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
       syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
       syscall_exit_to_user_mode include/linux/entry-common.h:325 [inline]
       do_syscall_64+0x67c/0xf80 arch/x86/entry/syscall_64.c:100
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

Chain exists of:
  fs_reclaim --> &nsock->tx_lock --> sk_lock-AF_INET6

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_INET6);
                               lock(&nsock->tx_lock);
                               lock(sk_lock-AF_INET6);
  lock(fs_reclaim);

 *** DEADLOCK ***

Fixes: fd8383f ("nbd: convert to blkmq")
Reported-by: [email protected]
Closes: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Kuniyuki Iwashima <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 25, 2026
As reported by syzbot [0], NBD can trigger a deadlock during
memory reclaim.

This occurs when a process holds lock_sock() on a backend TCP
socket and triggers a memory allocation that leads to fs reclaim.
If it eventually calls into NBD to send data or shut down the
socket, NBD will attempt to acquire the same lock_sock(),
resulting in the deadlock.

While NBD sets sk->sk_allocation to GFP_NOIO before calling
sendmsg(), this does not prevent the issue in some paths where
GFP_KERNEL is used directly under lock_sock().

To resolve this, let's use lock_sock_try() for TCP sendmsg() and
shutdown().

For sock_sendmsg(), if lock_sock_try() fails, -ERESTARTSYS is
returned, allowing the request to be retried later (e.g., via
was_interrupted() logic).

For sock_sendmsg() for NBD_CMD_DISC and kernel_sock_shutdown(),
the operation might be skipped if the lock cannot be acquired.
However, this is not expected to occur in practice because the
backend TCP socket should not be touched by userspace once it is
handed over to NBD.

Note that sock_recvmsg() does not require this special handling
because it is only called from the workqueue context.

Also note that AF_UNIX sockets continue to use sock_sendmsg()
and kernel_sock_shutdown() because unix_stream_sendmsg() and
unix_shutdown() do not acquire lock_sock().

[0]:
WARNING: possible circular locking dependency detected
syzkaller #0 Tainted: G             L
syz.7.2282/12353 is trying to acquire lock:
ffffffff8e9aa700 (fs_reclaim){+.+.}-{0:0}, at: might_alloc include/linux/sched/mm.h:317 [inline]
ffffffff8e9aa700 (fs_reclaim){+.+.}-{0:0}, at: slab_pre_alloc_hook mm/slub.c:4489 [inline]
ffffffff8e9aa700 (fs_reclaim){+.+.}-{0:0}, at: slab_alloc_node mm/slub.c:4843 [inline]
ffffffff8e9aa700 (fs_reclaim){+.+.}-{0:0}, at: kmem_cache_alloc_node_noprof+0x53/0x6f0 mm/slub.c:4918

but task is already holding lock:
ffff88806f972a20 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1709 [inline]
ffff88806f972a20 (sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_close+0x1d/0x110 net/ipv4/tcp.c:3349

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (sk_lock-AF_INET6){+.+.}-{0:0}:
       lock_sock_nested+0x41/0xf0 net/core/sock.c:3780
       lock_sock include/net/sock.h:1709 [inline]
       inet_shutdown+0x67/0x410 net/ipv4/af_inet.c:919
       nbd_mark_nsock_dead+0xae/0x5c0 drivers/block/nbd.c:318
       sock_shutdown+0x16b/0x200 drivers/block/nbd.c:411
       nbd_clear_sock drivers/block/nbd.c:1427 [inline]
       nbd_config_put+0x1eb/0x750 drivers/block/nbd.c:1451
       nbd_genl_connect+0xaf8/0x1a40 drivers/block/nbd.c:2248
       genl_family_rcv_msg_doit+0x214/0x300 net/netlink/genetlink.c:1114
       genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
       genl_rcv_msg+0x560/0x800 net/netlink/genetlink.c:1209
       netlink_rcv_skb+0x159/0x420 net/netlink/af_netlink.c:2550
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
       netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
       netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344
       netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
       sock_sendmsg_nosec net/socket.c:727 [inline]
       __sock_sendmsg net/socket.c:742 [inline]
       ____sys_sendmsg+0x9e1/0xb70 net/socket.c:2592
       ___sys_sendmsg+0x190/0x1e0 net/socket.c:2646
       __sys_sendmsg+0x170/0x220 net/socket.c:2678
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #5 (&nsock->tx_lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x1a2/0x1b90 kernel/locking/mutex.c:776
       nbd_handle_cmd drivers/block/nbd.c:1143 [inline]
       nbd_queue_rq+0x428/0x1080 drivers/block/nbd.c:1207
       blk_mq_dispatch_rq_list+0x422/0x1e70 block/blk-mq.c:2148
       __blk_mq_do_dispatch_sched block/blk-mq-sched.c:168 [inline]
       blk_mq_do_dispatch_sched block/blk-mq-sched.c:182 [inline]
       __blk_mq_sched_dispatch_requests+0xcea/0x1620 block/blk-mq-sched.c:307
       blk_mq_sched_dispatch_requests+0xd7/0x1c0 block/blk-mq-sched.c:329
       blk_mq_run_hw_queue+0x23c/0x670 block/blk-mq.c:2386
       blk_mq_dispatch_list+0x51d/0x1360 block/blk-mq.c:2949
       blk_mq_flush_plug_list block/blk-mq.c:2997 [inline]
       blk_mq_flush_plug_list+0x130/0x600 block/blk-mq.c:2969
       __blk_flush_plug+0x2c4/0x4b0 block/blk-core.c:1230
       blk_finish_plug block/blk-core.c:1257 [inline]
       __submit_bio+0x584/0x6c0 block/blk-core.c:649
       __submit_bio_noacct_mq block/blk-core.c:722 [inline]
       submit_bio_noacct_nocheck+0x562/0xc10 block/blk-core.c:753
       submit_bio_noacct+0xd17/0x2010 block/blk-core.c:884
       blk_crypto_submit_bio include/linux/blk-crypto.h:203 [inline]
       submit_bh_wbc+0x59c/0x770 fs/buffer.c:2821
       submit_bh fs/buffer.c:2826 [inline]
       block_read_full_folio+0x264/0x8e0 fs/buffer.c:2444
       filemap_read_folio+0xfc/0x3b0 mm/filemap.c:2501
       do_read_cache_folio+0x2d7/0x6b0 mm/filemap.c:4101
       read_mapping_folio include/linux/pagemap.h:1028 [inline]
       read_part_sector+0xd1/0x370 block/partitions/core.c:723
       adfspart_check_ICS+0x93/0x910 block/partitions/acorn.c:360
       check_partition block/partitions/core.c:142 [inline]
       blk_add_partitions block/partitions/core.c:590 [inline]
       bdev_disk_changed+0x7f8/0xc80 block/partitions/core.c:694
       blkdev_get_whole+0x187/0x290 block/bdev.c:764
       bdev_open+0x2c7/0xe40 block/bdev.c:973
       blkdev_open+0x34e/0x4f0 block/fops.c:697
       do_dentry_open+0x6d8/0x1660 fs/open.c:949
       vfs_open+0x82/0x3f0 fs/open.c:1081
       do_open fs/namei.c:4671 [inline]
       path_openat+0x208c/0x31a0 fs/namei.c:4830
       do_file_open+0x20e/0x430 fs/namei.c:4859
       do_sys_openat2+0x10d/0x1e0 fs/open.c:1366
       do_sys_open fs/open.c:1372 [inline]
       __do_sys_openat fs/open.c:1388 [inline]
       __se_sys_openat fs/open.c:1383 [inline]
       __x64_sys_openat+0x12d/0x210 fs/open.c:1383
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #4 (&cmd->lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x1a2/0x1b90 kernel/locking/mutex.c:776
       nbd_queue_rq+0xba/0x1080 drivers/block/nbd.c:1199
       blk_mq_dispatch_rq_list+0x422/0x1e70 block/blk-mq.c:2148
       __blk_mq_do_dispatch_sched block/blk-mq-sched.c:168 [inline]
       blk_mq_do_dispatch_sched block/blk-mq-sched.c:182 [inline]
       __blk_mq_sched_dispatch_requests+0xcea/0x1620 block/blk-mq-sched.c:307
       blk_mq_sched_dispatch_requests+0xd7/0x1c0 block/blk-mq-sched.c:329
       blk_mq_run_hw_queue+0x23c/0x670 block/blk-mq.c:2386
       blk_mq_dispatch_list+0x51d/0x1360 block/blk-mq.c:2949
       blk_mq_flush_plug_list block/blk-mq.c:2997 [inline]
       blk_mq_flush_plug_list+0x130/0x600 block/blk-mq.c:2969
       __blk_flush_plug+0x2c4/0x4b0 block/blk-core.c:1230
       blk_finish_plug block/blk-core.c:1257 [inline]
       __submit_bio+0x584/0x6c0 block/blk-core.c:649
       __submit_bio_noacct_mq block/blk-core.c:722 [inline]
       submit_bio_noacct_nocheck+0x562/0xc10 block/blk-core.c:753
       submit_bio_noacct+0xd17/0x2010 block/blk-core.c:884
       blk_crypto_submit_bio include/linux/blk-crypto.h:203 [inline]
       submit_bh_wbc+0x59c/0x770 fs/buffer.c:2821
       submit_bh fs/buffer.c:2826 [inline]
       block_read_full_folio+0x264/0x8e0 fs/buffer.c:2444
       filemap_read_folio+0xfc/0x3b0 mm/filemap.c:2501
       do_read_cache_folio+0x2d7/0x6b0 mm/filemap.c:4101
       read_mapping_folio include/linux/pagemap.h:1028 [inline]
       read_part_sector+0xd1/0x370 block/partitions/core.c:723
       adfspart_check_ICS+0x93/0x910 block/partitions/acorn.c:360
       check_partition block/partitions/core.c:142 [inline]
       blk_add_partitions block/partitions/core.c:590 [inline]
       bdev_disk_changed+0x7f8/0xc80 block/partitions/core.c:694
       blkdev_get_whole+0x187/0x290 block/bdev.c:764
       bdev_open+0x2c7/0xe40 block/bdev.c:973
       blkdev_open+0x34e/0x4f0 block/fops.c:697
       do_dentry_open+0x6d8/0x1660 fs/open.c:949
       vfs_open+0x82/0x3f0 fs/open.c:1081
       do_open fs/namei.c:4671 [inline]
       path_openat+0x208c/0x31a0 fs/namei.c:4830
       do_file_open+0x20e/0x430 fs/namei.c:4859
       do_sys_openat2+0x10d/0x1e0 fs/open.c:1366
       do_sys_open fs/open.c:1372 [inline]
       __do_sys_openat fs/open.c:1388 [inline]
       __se_sys_openat fs/open.c:1383 [inline]
       __x64_sys_openat+0x12d/0x210 fs/open.c:1383
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #3 (set->srcu){.+.+}-{0:0}:
       srcu_lock_sync include/linux/srcu.h:199 [inline]
       __synchronize_srcu+0xa1/0x2a0 kernel/rcu/srcutree.c:1505
       blk_mq_wait_quiesce_done block/blk-mq.c:284 [inline]
       blk_mq_wait_quiesce_done block/blk-mq.c:281 [inline]
       blk_mq_quiesce_queue block/blk-mq.c:304 [inline]
       blk_mq_quiesce_queue+0x149/0x1c0 block/blk-mq.c:299
       elevator_switch+0x17b/0x7e0 block/elevator.c:576
       elevator_change+0x352/0x530 block/elevator.c:681
       elevator_set_default+0x29e/0x360 block/elevator.c:754
       blk_register_queue+0x412/0x590 block/blk-sysfs.c:946
       __add_disk+0x73f/0xe40 block/genhd.c:528
       add_disk_fwnode+0x118/0x5c0 block/genhd.c:597
       add_disk include/linux/blkdev.h:785 [inline]
       nbd_dev_add+0x77a/0xb10 drivers/block/nbd.c:1984
       nbd_init+0x291/0x2b0 drivers/block/nbd.c:2692
       do_one_initcall+0x11d/0x760 init/main.c:1382
       do_initcall_level init/main.c:1444 [inline]
       do_initcalls init/main.c:1460 [inline]
       do_basic_setup init/main.c:1479 [inline]
       kernel_init_freeable+0x6e5/0x7a0 init/main.c:1692
       kernel_init+0x1f/0x1e0 init/main.c:1582
       ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

-> #2 (&q->elevator_lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x1a2/0x1b90 kernel/locking/mutex.c:776
       elevator_change+0x1bc/0x530 block/elevator.c:679
       elevator_set_none+0x92/0xf0 block/elevator.c:769
       blk_mq_elv_switch_none block/blk-mq.c:5110 [inline]
       __blk_mq_update_nr_hw_queues block/blk-mq.c:5155 [inline]
       blk_mq_update_nr_hw_queues+0x4c1/0x15f0 block/blk-mq.c:5220
       nbd_start_device+0x1a6/0xbd0 drivers/block/nbd.c:1489
       nbd_genl_connect+0xff2/0x1a40 drivers/block/nbd.c:2239
       genl_family_rcv_msg_doit+0x214/0x300 net/netlink/genetlink.c:1114
       genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
       genl_rcv_msg+0x560/0x800 net/netlink/genetlink.c:1209
       netlink_rcv_skb+0x159/0x420 net/netlink/af_netlink.c:2550
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
       netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
       netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344
       netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
       sock_sendmsg_nosec net/socket.c:727 [inline]
       __sock_sendmsg net/socket.c:742 [inline]
       ____sys_sendmsg+0x9e1/0xb70 net/socket.c:2592
       ___sys_sendmsg+0x190/0x1e0 net/socket.c:2646
       __sys_sendmsg+0x170/0x220 net/socket.c:2678
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #1 (&q->q_usage_counter(io)#49){++++}-{0:0}:
       blk_alloc_queue+0x610/0x790 block/blk-core.c:461
       blk_mq_alloc_queue+0x174/0x290 block/blk-mq.c:4429
       __blk_mq_alloc_disk+0x29/0x120 block/blk-mq.c:4476
       nbd_dev_add+0x492/0xb10 drivers/block/nbd.c:1954
       nbd_init+0x291/0x2b0 drivers/block/nbd.c:2692
       do_one_initcall+0x11d/0x760 init/main.c:1382
       do_initcall_level init/main.c:1444 [inline]
       do_initcalls init/main.c:1460 [inline]
       do_basic_setup init/main.c:1479 [inline]
       kernel_init_freeable+0x6e5/0x7a0 init/main.c:1692
       kernel_init+0x1f/0x1e0 init/main.c:1582
       ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

-> #0 (fs_reclaim){+.+.}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3165 [inline]
       check_prevs_add kernel/locking/lockdep.c:3284 [inline]
       validate_chain kernel/locking/lockdep.c:3908 [inline]
       __lock_acquire+0x14b8/0x2630 kernel/locking/lockdep.c:5237
       lock_acquire kernel/locking/lockdep.c:5868 [inline]
       lock_acquire+0x1cf/0x380 kernel/locking/lockdep.c:5825
       __fs_reclaim_acquire mm/page_alloc.c:4348 [inline]
       fs_reclaim_acquire+0xc4/0x100 mm/page_alloc.c:4362
       might_alloc include/linux/sched/mm.h:317 [inline]
       slab_pre_alloc_hook mm/slub.c:4489 [inline]
       slab_alloc_node mm/slub.c:4843 [inline]
       kmem_cache_alloc_node_noprof+0x53/0x6f0 mm/slub.c:4918
       __alloc_skb+0x140/0x710 net/core/skbuff.c:702
       alloc_skb include/linux/skbuff.h:1383 [inline]
       tcp_send_active_reset+0x8b/0xa60 net/ipv4/tcp_output.c:3862
       __tcp_close+0x41e/0x1110 net/ipv4/tcp.c:3223
       tcp_close+0x28/0x110 net/ipv4/tcp.c:3350
       inet_release+0xed/0x200 net/ipv4/af_inet.c:443
       inet6_release+0x4f/0x70 net/ipv6/af_inet6.c:479
       __sock_release+0xb3/0x260 net/socket.c:662
       sock_close+0x1c/0x30 net/socket.c:1455
       __fput+0x3ff/0xb40 fs/file_table.c:469
       task_work_run+0x150/0x240 kernel/task_work.c:233
       resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
       __exit_to_user_mode_loop kernel/entry/common.c:67 [inline]
       exit_to_user_mode_loop+0x100/0x4a0 kernel/entry/common.c:98
       __exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
       syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
       syscall_exit_to_user_mode include/linux/entry-common.h:325 [inline]
       do_syscall_64+0x67c/0xf80 arch/x86/entry/syscall_64.c:100
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

Chain exists of:
  fs_reclaim --> &nsock->tx_lock --> sk_lock-AF_INET6

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_INET6);
                               lock(&nsock->tx_lock);
                               lock(sk_lock-AF_INET6);
  lock(fs_reclaim);

 *** DEADLOCK ***

Fixes: fd8383f ("nbd: convert to blkmq")
Reported-by: [email protected]
Closes: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Kuniyuki Iwashima <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 25, 2026
A malicious user program can request large user-memory pinning via
ioctl with a large metadata_len. However, this does not guarantee
that all requested memory will be pinned. Pinning may partially succeed
and return the number of bytes that were actually pinned, which may not
match the requested size. In this case, only the addresses of the
pinned pages are valid.

The current implementation does not handle partial pinning and
incorrectly assumes that all pages in the range [0, nr_vecs) are valid.
This can lead to a null-pointer dereference because pages[n] may refer
to an unpinned memory range.

To fix this, add a check to verify that all requested pages are
successfully pinned. Pinning all pages is required to copy user data.

KASAN splat:

Syzkaller hit 'general protection fault in bio_integrity_map_user' bug.

nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: Command: 80f60320000000000300000000c9ffffb38ab5410000000070693aa0ffffffffb00e619dffffffff80f6032000000000b38ab5410000000070fc38a0ffffffff
nvme nvme0: 2/0/0 default/read/poll queues
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 280 Comm: syz-executor294 Not tainted 6.11.0-dirty #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:_compound_head home/wukong/fuzznvme/linux/./include/linux/page-flags.h:240 [inline]
RIP: 0010:bvec_from_pages home/wukong/fuzznvme/linux/block/bio-integrity.c:290 [inline]
RIP: 0010:bio_integrity_map_user+0x5a3/0x11e0 home/wukong/fuzznvme/linux/block/bio-integrity.c:345
Code: 4c 89 e0 48 c1 e8 03 80 3c 30 00 0f 85 4b 0a 00 00 48 be 00 00 00 00 00 fc ff df 49 8b 1c 24 48 8d 7b 08 48 89 f8 48 c1 e8 03 <80> 3c 30 00 0f 85 35 0a 00 00 48 8b 43 08 31 ff 49 89 c5 48 89 44
RSP: 0018:ffffc900010cf4f0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000f761
RDX: ffff888006cae600 RSI: dffffc0000000000 RDI: 0000000000000008
RBP: ffffc900010cf7d0 R08: ffff888006cae600 R09: ffffed1000e0db95
R10: ffff88800706dcaf R11: ffff888006a31a00 R12: ffff888006a31a08
R13: 0000000000000740 R14: ffff888006a31a00 R15: 0000000000000001
FS:  0000555587e483c0(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002002f8c0 CR3: 000000000719a000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 nvme_map_user_request+0x4b6/0x5e0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:149
 nvme_submit_user_cmd+0x2e8/0x3c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:185
 nvme_user_cmd.constprop.0+0x35b/0x540 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:325
 nvme_ns_ioctl+0x11e/0x1c0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:570
 nvme_ioctl+0x147/0x1d0 home/wukong/fuzznvme/linux/drivers/nvme/host/ioctl.c:605
 blkdev_ioctl+0x28c/0x6c0 home/wukong/fuzznvme/linux/block/ioctl.c:676
 vfs_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:51 [inline]
 __do_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:907 [inline]
 __se_sys_ioctl home/wukong/fuzznvme/linux/fs/ioctl.c:893 [inline]
 __x64_sys_ioctl+0x1bc/0x230 home/wukong/fuzznvme/linux/fs/ioctl.c:893
 x64_sys_call+0x1209/0x20d0 home/wukong/fuzznvme/linux/./arch/x86/include/generated/asm/syscalls_64.h:17
 do_syscall_x64 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x6f/0x110 home/wukong/fuzznvme/linux/arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5b3d8e98bd
Code: c3 e8 a7 1f 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd1c0e988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f5b3d8e98bd
RDX: 000000002003f680 RSI: 00000000c0484e43 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007f5b3d93eb4d R09: 00007f5b3d93eb4d
R10: 00007f5b3d93eb4d R11: 0000000000000246 R12: 0000000000000001
R13: 00007fffd1c0ebe8 R14: 00007fffd1c0e9b0 R15: 00007fffd1c0e9a0
 </TASK>
Modules linked in:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#2] PREEMPT SMP KASAN PTI

Fixes: 492c5d4 (block: bio-integrity: directly map user buffers)
Acked-by: Chao Shi <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Acked-by: Dave Tian <[email protected]>
Signed-off-by: Sungwoo Kim <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 27, 2026
The devm_free_irq() and devm_request_irq() functions should not be
executed in an atomic context.

During device suspend, all userspace processes and most kernel threads
are frozen. Additionally, we flush all tx/rx status, disable all macb
interrupts, and halt rx operations. Therefore, it is safe to split the
region protected by bp->lock into two independent sections, allowing
devm_free_irq() and devm_request_irq() to run in a non-atomic context.
This modification resolves the following lockdep warning:
  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:591
  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 501, name: rtcwake
  preempt_count: 1, expected: 0
  RCU nest depth: 1, expected: 0
  7 locks held by rtcwake/501:
   #0: ffff0008038c3408 (sb_writers#5){.+.+}-{0:0}, at: vfs_write+0xf8/0x368
   #1: ffff0008049a5e88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xbc/0x1c8
   #2: ffff00080098d588 (kn->active#70){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xcc/0x1c8
   #3: ffff800081c84888 (system_transition_mutex){+.+.}-{4:4}, at: pm_suspend+0x1ec/0x290
   #4: ffff0008009ba0f8 (&dev->mutex){....}-{4:4}, at: device_suspend+0x118/0x4f0
   #5: ffff800081d00458 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
   #6: ffff0008031fb9e0 (&bp->lock){-.-.}-{3:3}, at: macb_suspend+0x144/0x558
  irq event stamp: 8682
  hardirqs last  enabled at (8681): [<ffff8000813c7d7c>] _raw_spin_unlock_irqrestore+0x44/0x88
  hardirqs last disabled at (8682): [<ffff8000813c7b58>] _raw_spin_lock_irqsave+0x38/0x98
  softirqs last  enabled at (7322): [<ffff8000800f1b4c>] handle_softirqs+0x52c/0x588
  softirqs last disabled at (7317): [<ffff800080010310>] __do_softirq+0x20/0x2c
  CPU: 1 UID: 0 PID: 501 Comm: rtcwake Not tainted 7.0.0-rc3-next-20260310-yocto-standard+ #125 PREEMPT
  Hardware name: ZynqMP ZCU102 Rev1.1 (DT)
  Call trace:
   show_stack+0x24/0x38 (C)
   __dump_stack+0x28/0x38
   dump_stack_lvl+0x64/0x88
   dump_stack+0x18/0x24
   __might_resched+0x200/0x218
   __might_sleep+0x38/0x98
   __mutex_lock_common+0x7c/0x1378
   mutex_lock_nested+0x38/0x50
   free_irq+0x68/0x2b0
   devm_irq_release+0x24/0x38
   devres_release+0x40/0x80
   devm_free_irq+0x48/0x88
   macb_suspend+0x298/0x558
   device_suspend+0x218/0x4f0
   dpm_suspend+0x244/0x3a0
   dpm_suspend_start+0x50/0x78
   suspend_devices_and_enter+0xec/0x560
   pm_suspend+0x194/0x290
   state_store+0x110/0x158
   kobj_attr_store+0x1c/0x30
   sysfs_kf_write+0xa8/0xd0
   kernfs_fop_write_iter+0x11c/0x1c8
   vfs_write+0x248/0x368
   ksys_write+0x7c/0xf8
   __arm64_sys_write+0x28/0x40
   invoke_syscall+0x4c/0xe8
   el0_svc_common+0x98/0xf0
   do_el0_svc+0x28/0x40
   el0_svc+0x54/0x1e0
   el0t_64_sync_handler+0x84/0x130
   el0t_64_sync+0x198/0x1a0

Fixes: 558e35c ("net: macb: WoL support for GEM type of Ethernet controller")
Cc: [email protected]
Reviewed-by: Théo Lebrun <[email protected]>
Signed-off-by: Kevin Hao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 27, 2026
…nd napi_tx is false

A UAF issue occurs when the virtio_net driver is configured with napi_tx=N
and the device's IFF_XMIT_DST_RELEASE flag is cleared
(e.g., during the configuration of tc route filter rules).

When IFF_XMIT_DST_RELEASE is removed from the net_device, the network stack
expects the driver to hold the reference to skb->dst until the packet
is fully transmitted and freed. In virtio_net with napi_tx=N,
skbs may remain in the virtio transmit ring for an extended period.

If the network namespace is destroyed while these skbs are still pending,
the corresponding dst_ops structure has freed. When a subsequent packet
is transmitted, free_old_xmit() is triggered to clean up old skbs.
It then calls dst_release() on the skb associated with the stale dst_entry.
Since the dst_ops (referenced by the dst_entry) has already been freed,
a UAF kernel paging request occurs.

fix it by adds skb_dst_drop(skb) in start_xmit to explicitly release
the dst reference before the skb is queued in virtio_net.

Call Trace:
 Unable to handle kernel paging request at virtual address ffff80007e150000
 CPU: 2 UID: 0 PID: 6236 Comm: ping Kdump: loaded Not tainted 7.0.0-rc1+ #6 PREEMPT
  ...
  percpu_counter_add_batch+0x3c/0x158 lib/percpu_counter.c:98 (P)
  dst_release+0xe0/0x110  net/core/dst.c:177
  skb_release_head_state+0xe8/0x108 net/core/skbuff.c:1177
  sk_skb_reason_drop+0x54/0x2d8 net/core/skbuff.c:1255
  dev_kfree_skb_any_reason+0x64/0x78 net/core/dev.c:3469
  napi_consume_skb+0x1c4/0x3a0 net/core/skbuff.c:1527
  __free_old_xmit+0x164/0x230  drivers/net/virtio_net.c:611 [virtio_net]
  free_old_xmit drivers/net/virtio_net.c:1081 [virtio_net]
  start_xmit+0x7c/0x530 drivers/net/virtio_net.c:3329 [virtio_net]
  ...

Reproduction Steps:
NETDEV="enp3s0"

config_qdisc_route_filter() {
    tc qdisc del dev $NETDEV root
    tc qdisc add dev $NETDEV root handle 1: prio
    tc filter add dev $NETDEV parent 1:0 \
	protocol ip prio 100 route to 100 flowid 1:1
    ip route add 192.168.1.100/32 dev $NETDEV realm 100
}

test_ns() {
    ip netns add testns
    ip link set $NETDEV netns testns
    ip netns exec testns ifconfig $NETDEV  10.0.32.46/24
    ip netns exec testns ping -c 1 10.0.32.1
    ip netns del testns
}

config_qdisc_route_filter

test_ns
sleep 2
test_ns

Fixes: f2fc6a5 ("[NETNS][IPV6] route6 - move ip6_dst_ops inside the network namespace")
Cc: [email protected]
Signed-off-by: xietangxin <[email protected]>
Reviewed-by: Xuan Zhuo <[email protected]>
Fixes: 0287587 ("net: better IFF_XMIT_DST_RELEASE support")
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
blktests-ci Bot pushed a commit that referenced this pull request Mar 27, 2026
As reported by syzbot [0], NBD can trigger a deadlock during
memory reclaim.

This occurs when a process holds lock_sock() on a backend TCP
socket and triggers a memory allocation that leads to fs reclaim.
If it eventually calls into NBD to send data or shut down the
socket, NBD will attempt to acquire the same lock_sock(),
resulting in the deadlock.

While NBD sets sk->sk_allocation to GFP_NOIO before calling
sendmsg(), this does not prevent the issue in some paths where
GFP_KERNEL is used directly under lock_sock().

To resolve this, let's use lock_sock_try() for TCP sendmsg() and
shutdown().

For sock_sendmsg(), if lock_sock_try() fails, -ERESTARTSYS is
returned, allowing the request to be retried later (e.g., via
was_interrupted() logic).

For sock_sendmsg() for NBD_CMD_DISC and kernel_sock_shutdown(),
the operation might be skipped if the lock cannot be acquired.
However, this is not expected to occur in practice because the
backend TCP socket should not be touched by userspace once it is
handed over to NBD.

Note that sock_recvmsg() does not require this special handling
because it is only called from the workqueue context.

Also note that AF_UNIX sockets continue to use sock_sendmsg()
and kernel_sock_shutdown() because unix_stream_sendmsg() and
unix_shutdown() do not acquire lock_sock().

[0]:
WARNING: possible circular locking dependency detected
syzkaller #0 Tainted: G             L
syz.7.2282/12353 is trying to acquire lock:
ffffffff8e9aa700 (fs_reclaim){+.+.}-{0:0}, at: might_alloc include/linux/sched/mm.h:317 [inline]
ffffffff8e9aa700 (fs_reclaim){+.+.}-{0:0}, at: slab_pre_alloc_hook mm/slub.c:4489 [inline]
ffffffff8e9aa700 (fs_reclaim){+.+.}-{0:0}, at: slab_alloc_node mm/slub.c:4843 [inline]
ffffffff8e9aa700 (fs_reclaim){+.+.}-{0:0}, at: kmem_cache_alloc_node_noprof+0x53/0x6f0 mm/slub.c:4918

but task is already holding lock:
ffff88806f972a20 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1709 [inline]
ffff88806f972a20 (sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_close+0x1d/0x110 net/ipv4/tcp.c:3349

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (sk_lock-AF_INET6){+.+.}-{0:0}:
       lock_sock_nested+0x41/0xf0 net/core/sock.c:3780
       lock_sock include/net/sock.h:1709 [inline]
       inet_shutdown+0x67/0x410 net/ipv4/af_inet.c:919
       nbd_mark_nsock_dead+0xae/0x5c0 drivers/block/nbd.c:318
       sock_shutdown+0x16b/0x200 drivers/block/nbd.c:411
       nbd_clear_sock drivers/block/nbd.c:1427 [inline]
       nbd_config_put+0x1eb/0x750 drivers/block/nbd.c:1451
       nbd_genl_connect+0xaf8/0x1a40 drivers/block/nbd.c:2248
       genl_family_rcv_msg_doit+0x214/0x300 net/netlink/genetlink.c:1114
       genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
       genl_rcv_msg+0x560/0x800 net/netlink/genetlink.c:1209
       netlink_rcv_skb+0x159/0x420 net/netlink/af_netlink.c:2550
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
       netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
       netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344
       netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
       sock_sendmsg_nosec net/socket.c:727 [inline]
       __sock_sendmsg net/socket.c:742 [inline]
       ____sys_sendmsg+0x9e1/0xb70 net/socket.c:2592
       ___sys_sendmsg+0x190/0x1e0 net/socket.c:2646
       __sys_sendmsg+0x170/0x220 net/socket.c:2678
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #5 (&nsock->tx_lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x1a2/0x1b90 kernel/locking/mutex.c:776
       nbd_handle_cmd drivers/block/nbd.c:1143 [inline]
       nbd_queue_rq+0x428/0x1080 drivers/block/nbd.c:1207
       blk_mq_dispatch_rq_list+0x422/0x1e70 block/blk-mq.c:2148
       __blk_mq_do_dispatch_sched block/blk-mq-sched.c:168 [inline]
       blk_mq_do_dispatch_sched block/blk-mq-sched.c:182 [inline]
       __blk_mq_sched_dispatch_requests+0xcea/0x1620 block/blk-mq-sched.c:307
       blk_mq_sched_dispatch_requests+0xd7/0x1c0 block/blk-mq-sched.c:329
       blk_mq_run_hw_queue+0x23c/0x670 block/blk-mq.c:2386
       blk_mq_dispatch_list+0x51d/0x1360 block/blk-mq.c:2949
       blk_mq_flush_plug_list block/blk-mq.c:2997 [inline]
       blk_mq_flush_plug_list+0x130/0x600 block/blk-mq.c:2969
       __blk_flush_plug+0x2c4/0x4b0 block/blk-core.c:1230
       blk_finish_plug block/blk-core.c:1257 [inline]
       __submit_bio+0x584/0x6c0 block/blk-core.c:649
       __submit_bio_noacct_mq block/blk-core.c:722 [inline]
       submit_bio_noacct_nocheck+0x562/0xc10 block/blk-core.c:753
       submit_bio_noacct+0xd17/0x2010 block/blk-core.c:884
       blk_crypto_submit_bio include/linux/blk-crypto.h:203 [inline]
       submit_bh_wbc+0x59c/0x770 fs/buffer.c:2821
       submit_bh fs/buffer.c:2826 [inline]
       block_read_full_folio+0x264/0x8e0 fs/buffer.c:2444
       filemap_read_folio+0xfc/0x3b0 mm/filemap.c:2501
       do_read_cache_folio+0x2d7/0x6b0 mm/filemap.c:4101
       read_mapping_folio include/linux/pagemap.h:1028 [inline]
       read_part_sector+0xd1/0x370 block/partitions/core.c:723
       adfspart_check_ICS+0x93/0x910 block/partitions/acorn.c:360
       check_partition block/partitions/core.c:142 [inline]
       blk_add_partitions block/partitions/core.c:590 [inline]
       bdev_disk_changed+0x7f8/0xc80 block/partitions/core.c:694
       blkdev_get_whole+0x187/0x290 block/bdev.c:764
       bdev_open+0x2c7/0xe40 block/bdev.c:973
       blkdev_open+0x34e/0x4f0 block/fops.c:697
       do_dentry_open+0x6d8/0x1660 fs/open.c:949
       vfs_open+0x82/0x3f0 fs/open.c:1081
       do_open fs/namei.c:4671 [inline]
       path_openat+0x208c/0x31a0 fs/namei.c:4830
       do_file_open+0x20e/0x430 fs/namei.c:4859
       do_sys_openat2+0x10d/0x1e0 fs/open.c:1366
       do_sys_open fs/open.c:1372 [inline]
       __do_sys_openat fs/open.c:1388 [inline]
       __se_sys_openat fs/open.c:1383 [inline]
       __x64_sys_openat+0x12d/0x210 fs/open.c:1383
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #4 (&cmd->lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x1a2/0x1b90 kernel/locking/mutex.c:776
       nbd_queue_rq+0xba/0x1080 drivers/block/nbd.c:1199
       blk_mq_dispatch_rq_list+0x422/0x1e70 block/blk-mq.c:2148
       __blk_mq_do_dispatch_sched block/blk-mq-sched.c:168 [inline]
       blk_mq_do_dispatch_sched block/blk-mq-sched.c:182 [inline]
       __blk_mq_sched_dispatch_requests+0xcea/0x1620 block/blk-mq-sched.c:307
       blk_mq_sched_dispatch_requests+0xd7/0x1c0 block/blk-mq-sched.c:329
       blk_mq_run_hw_queue+0x23c/0x670 block/blk-mq.c:2386
       blk_mq_dispatch_list+0x51d/0x1360 block/blk-mq.c:2949
       blk_mq_flush_plug_list block/blk-mq.c:2997 [inline]
       blk_mq_flush_plug_list+0x130/0x600 block/blk-mq.c:2969
       __blk_flush_plug+0x2c4/0x4b0 block/blk-core.c:1230
       blk_finish_plug block/blk-core.c:1257 [inline]
       __submit_bio+0x584/0x6c0 block/blk-core.c:649
       __submit_bio_noacct_mq block/blk-core.c:722 [inline]
       submit_bio_noacct_nocheck+0x562/0xc10 block/blk-core.c:753
       submit_bio_noacct+0xd17/0x2010 block/blk-core.c:884
       blk_crypto_submit_bio include/linux/blk-crypto.h:203 [inline]
       submit_bh_wbc+0x59c/0x770 fs/buffer.c:2821
       submit_bh fs/buffer.c:2826 [inline]
       block_read_full_folio+0x264/0x8e0 fs/buffer.c:2444
       filemap_read_folio+0xfc/0x3b0 mm/filemap.c:2501
       do_read_cache_folio+0x2d7/0x6b0 mm/filemap.c:4101
       read_mapping_folio include/linux/pagemap.h:1028 [inline]
       read_part_sector+0xd1/0x370 block/partitions/core.c:723
       adfspart_check_ICS+0x93/0x910 block/partitions/acorn.c:360
       check_partition block/partitions/core.c:142 [inline]
       blk_add_partitions block/partitions/core.c:590 [inline]
       bdev_disk_changed+0x7f8/0xc80 block/partitions/core.c:694
       blkdev_get_whole+0x187/0x290 block/bdev.c:764
       bdev_open+0x2c7/0xe40 block/bdev.c:973
       blkdev_open+0x34e/0x4f0 block/fops.c:697
       do_dentry_open+0x6d8/0x1660 fs/open.c:949
       vfs_open+0x82/0x3f0 fs/open.c:1081
       do_open fs/namei.c:4671 [inline]
       path_openat+0x208c/0x31a0 fs/namei.c:4830
       do_file_open+0x20e/0x430 fs/namei.c:4859
       do_sys_openat2+0x10d/0x1e0 fs/open.c:1366
       do_sys_open fs/open.c:1372 [inline]
       __do_sys_openat fs/open.c:1388 [inline]
       __se_sys_openat fs/open.c:1383 [inline]
       __x64_sys_openat+0x12d/0x210 fs/open.c:1383
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #3 (set->srcu){.+.+}-{0:0}:
       srcu_lock_sync include/linux/srcu.h:199 [inline]
       __synchronize_srcu+0xa1/0x2a0 kernel/rcu/srcutree.c:1505
       blk_mq_wait_quiesce_done block/blk-mq.c:284 [inline]
       blk_mq_wait_quiesce_done block/blk-mq.c:281 [inline]
       blk_mq_quiesce_queue block/blk-mq.c:304 [inline]
       blk_mq_quiesce_queue+0x149/0x1c0 block/blk-mq.c:299
       elevator_switch+0x17b/0x7e0 block/elevator.c:576
       elevator_change+0x352/0x530 block/elevator.c:681
       elevator_set_default+0x29e/0x360 block/elevator.c:754
       blk_register_queue+0x412/0x590 block/blk-sysfs.c:946
       __add_disk+0x73f/0xe40 block/genhd.c:528
       add_disk_fwnode+0x118/0x5c0 block/genhd.c:597
       add_disk include/linux/blkdev.h:785 [inline]
       nbd_dev_add+0x77a/0xb10 drivers/block/nbd.c:1984
       nbd_init+0x291/0x2b0 drivers/block/nbd.c:2692
       do_one_initcall+0x11d/0x760 init/main.c:1382
       do_initcall_level init/main.c:1444 [inline]
       do_initcalls init/main.c:1460 [inline]
       do_basic_setup init/main.c:1479 [inline]
       kernel_init_freeable+0x6e5/0x7a0 init/main.c:1692
       kernel_init+0x1f/0x1e0 init/main.c:1582
       ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

-> #2 (&q->elevator_lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x1a2/0x1b90 kernel/locking/mutex.c:776
       elevator_change+0x1bc/0x530 block/elevator.c:679
       elevator_set_none+0x92/0xf0 block/elevator.c:769
       blk_mq_elv_switch_none block/blk-mq.c:5110 [inline]
       __blk_mq_update_nr_hw_queues block/blk-mq.c:5155 [inline]
       blk_mq_update_nr_hw_queues+0x4c1/0x15f0 block/blk-mq.c:5220
       nbd_start_device+0x1a6/0xbd0 drivers/block/nbd.c:1489
       nbd_genl_connect+0xff2/0x1a40 drivers/block/nbd.c:2239
       genl_family_rcv_msg_doit+0x214/0x300 net/netlink/genetlink.c:1114
       genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
       genl_rcv_msg+0x560/0x800 net/netlink/genetlink.c:1209
       netlink_rcv_skb+0x159/0x420 net/netlink/af_netlink.c:2550
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
       netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
       netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344
       netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
       sock_sendmsg_nosec net/socket.c:727 [inline]
       __sock_sendmsg net/socket.c:742 [inline]
       ____sys_sendmsg+0x9e1/0xb70 net/socket.c:2592
       ___sys_sendmsg+0x190/0x1e0 net/socket.c:2646
       __sys_sendmsg+0x170/0x220 net/socket.c:2678
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #1 (&q->q_usage_counter(io)#49){++++}-{0:0}:
       blk_alloc_queue+0x610/0x790 block/blk-core.c:461
       blk_mq_alloc_queue+0x174/0x290 block/blk-mq.c:4429
       __blk_mq_alloc_disk+0x29/0x120 block/blk-mq.c:4476
       nbd_dev_add+0x492/0xb10 drivers/block/nbd.c:1954
       nbd_init+0x291/0x2b0 drivers/block/nbd.c:2692
       do_one_initcall+0x11d/0x760 init/main.c:1382
       do_initcall_level init/main.c:1444 [inline]
       do_initcalls init/main.c:1460 [inline]
       do_basic_setup init/main.c:1479 [inline]
       kernel_init_freeable+0x6e5/0x7a0 init/main.c:1692
       kernel_init+0x1f/0x1e0 init/main.c:1582
       ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

-> #0 (fs_reclaim){+.+.}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3165 [inline]
       check_prevs_add kernel/locking/lockdep.c:3284 [inline]
       validate_chain kernel/locking/lockdep.c:3908 [inline]
       __lock_acquire+0x14b8/0x2630 kernel/locking/lockdep.c:5237
       lock_acquire kernel/locking/lockdep.c:5868 [inline]
       lock_acquire+0x1cf/0x380 kernel/locking/lockdep.c:5825
       __fs_reclaim_acquire mm/page_alloc.c:4348 [inline]
       fs_reclaim_acquire+0xc4/0x100 mm/page_alloc.c:4362
       might_alloc include/linux/sched/mm.h:317 [inline]
       slab_pre_alloc_hook mm/slub.c:4489 [inline]
       slab_alloc_node mm/slub.c:4843 [inline]
       kmem_cache_alloc_node_noprof+0x53/0x6f0 mm/slub.c:4918
       __alloc_skb+0x140/0x710 net/core/skbuff.c:702
       alloc_skb include/linux/skbuff.h:1383 [inline]
       tcp_send_active_reset+0x8b/0xa60 net/ipv4/tcp_output.c:3862
       __tcp_close+0x41e/0x1110 net/ipv4/tcp.c:3223
       tcp_close+0x28/0x110 net/ipv4/tcp.c:3350
       inet_release+0xed/0x200 net/ipv4/af_inet.c:443
       inet6_release+0x4f/0x70 net/ipv6/af_inet6.c:479
       __sock_release+0xb3/0x260 net/socket.c:662
       sock_close+0x1c/0x30 net/socket.c:1455
       __fput+0x3ff/0xb40 fs/file_table.c:469
       task_work_run+0x150/0x240 kernel/task_work.c:233
       resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
       __exit_to_user_mode_loop kernel/entry/common.c:67 [inline]
       exit_to_user_mode_loop+0x100/0x4a0 kernel/entry/common.c:98
       __exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
       syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
       syscall_exit_to_user_mode include/linux/entry-common.h:325 [inline]
       do_syscall_64+0x67c/0xf80 arch/x86/entry/syscall_64.c:100
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

Chain exists of:
  fs_reclaim --> &nsock->tx_lock --> sk_lock-AF_INET6

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_INET6);
                               lock(&nsock->tx_lock);
                               lock(sk_lock-AF_INET6);
  lock(fs_reclaim);

 *** DEADLOCK ***

Fixes: fd8383f ("nbd: convert to blkmq")
Reported-by: [email protected]
Closes: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants