Commit 42081e3
nvme: downgrade WARN in nvme_setup_rw to pr_debug
When an NVMe namespace is configured with embedded metadata (flbas bit 4
set, NVME_NS_FLBAS_META_EXT) but no Protection Information (dps=0) and
no NVME_NS_METADATA_SUPPORTED, nvme_setup_rw() fires WARN_ON_ONCE on
any request that reaches it with REQ_INTEGRITY unset. The WARN was
observed repeatedly during NVMe fuzz testing with a FEMU-based fuzzer
that performs semantic mutation of Identify Namespace responses.
The trigger requires three conditions to align: (a) a namespace
transitions through the EXT_LBAS non-PI state (head->ms != 0,
features & NVME_NS_EXT_LBAS, !(features & NVME_NS_METADATA_SUPPORTED)),
(b) nvme_init_integrity() returns false through the early-exit branch
at core.c:1834 without populating bi->metadata_size, leaving the disk
without an integrity profile (blk_get_integrity() returns NULL), and
(c) a request that was admitted to the block layer before the namespace
update reaches nvme_setup_rw() after it.
The admission gap arises in two places. First, the plug-list flush
path: a process with dirty pages queued in a plug before the namespace
update flushes them on file close (blk_finish_plug -> blk_mq_dispatch
-> nvme_setup_rw), bypassing any capacity-zero gate. Second, the
cached-rq path: blk_mq_submit_bio() at blk-mq.c:3155 may find a cached
request; if so, the bio_queue_enter() freeze-serialization guard at
blk-mq.c:3174-3176 is skipped and the bio is dispatched immediately.
In both cases the bio was submitted without REQ_INTEGRITY (because
blk_get_integrity() returned NULL at dispatch time, so
bio_integrity_action() returned 0 and bio_integrity_prep() was not
called), and it reaches nvme_setup_rw() for a namespace where
head->ms != 0. The existing BLK_STS_NOTSUPP return correctly handles
this dispatch; the WARN_ON_ONCE is a false positive.
The WARN was reproduced six times over four days of fuzzing (April
2026). A representative crash shows the plug-flush path:
nvme0n1: detected capacity change from 2097152 to 0
WARNING: drivers/nvme/host/core.c:1042 at nvme_setup_rw+0x768/0xfd0
PID: 785 (systemd-udevd)
Call Trace:
nvme_setup_cmd / nvme_queue_rq / blk_mq_dispatch_rq_list
blk_mq_flush_plug_list / blk_finish_plug / blkdev_writepages
sync_blockdev / bdev_release / __fput / sys_close
Replace WARN_ON_ONCE with pr_debug_ratelimited so the condition is
logged at debug level without splat. The BLK_STS_NOTSUPP return is
preserved; I/O to the transitioning namespace is still rejected.
An alternative approach that addresses the root cause at the
integrity-profile level is proposed in patch 2/2: populate
bi->metadata_size for EXT_LBAS non-PI namespaces in nvme_init_integrity()
so that bio_integrity_action() returns non-zero, bio_integrity_prep()
sets REQ_INTEGRITY, and nvme_setup_rw() never reaches this branch.
Both patches are sent as RFC for maintainer guidance on the preferred
direction.
Tested: Compiled on linux-kcov-debug (6.19.0+, KASAN/DEBUG_LIST).
Boot-tested under FEMU with NVME_MALICIOUS_RESPONDER=1
NVME_SEMANTIC_DATA_MUTATOR=1; ran 4 concurrent dd processes plus 500
rescan_controller cycles. No WARN, BUG, or Oops observed.
Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
Acked-by: Sungwoo Kim <[email protected]>
Acked-by: Dave Tian <[email protected]>
Acked-by: Weidong Zhu <[email protected]>
Signed-off-by: Chao Shi <[email protected]>1 parent 857ada9 commit 42081e3
1 file changed
Lines changed: 5 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1039 | 1039 | | |
1040 | 1040 | | |
1041 | 1041 | | |
1042 | | - | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
1043 | 1046 | | |
| 1047 | + | |
1044 | 1048 | | |
1045 | 1049 | | |
1046 | 1050 | | |
| |||
0 commit comments