Skip to content

Commit 1f3fcb5

Browse files
ChaoShi001kawasaki
authored andcommitted
nvme: set integrity metadata size for EXT_LBAS non-PI namespace
This patch is an alternative to patch 1/2: instead of downgrading the assertion in nvme_setup_rw(), it addresses the root cause at the integrity-profile level so that the assertion is never reached. For PCIe namespaces with extended LBAs (NVME_NS_EXT_LBAS set, flbas bit 4) but without PI and without NVME_NS_METADATA_SUPPORTED, the early- exit branch of nvme_init_integrity() at core.c:1834 returns false without populating bi->metadata_size. As a result blk_get_integrity() returns NULL (it checks q->limits.integrity.metadata_size via blk_integrity_queue_supports_integrity()), bio_integrity_action() returns 0, bio_integrity_prep() is never called, and REQ_INTEGRITY is never set on bios dispatched to the namespace. Any such bio that reaches nvme_setup_rw() triggers WARN_ON_ONCE because head->ms != 0 but blk_integrity_rq() returns false. Populate bi->metadata_size = head->ms in the early-exit path for the EXT_LBAS non-PI case. This is sufficient to make blk_get_integrity() return non-NULL, which causes bio_integrity_action() to return non-zero, which causes bio_integrity_prep() to run and set REQ_INTEGRITY on any bio submitted to the namespace. Requests that reach nvme_setup_rw() then satisfy blk_integrity_rq() and the assertion is not reached. blk_validate_integrity_limits() accepts this configuration: with csum_type=BLK_INTEGRITY_CSUM_NONE, pi_tuple_size=0, and pi_offset=0, all checks pass (pi_offset + pi_tuple_size <= metadata_size, pi_tuple_size must be 0 for CSUM_NONE), and interval_exp is auto-filled to ilog2(logical_block_size). No generate/verify callbacks are configured, so no actual integrity computation occurs; only the blk_integrity_rq() predicate is satisfied. Capacity is still forced to 0 by set_capacity_and_notify(), so new bios are rejected by bio_check_eod() before queue entry. Tested: Compiled on linux-kcov-debug (6.19.0+, KASAN/DEBUG_LIST). Boot-tested under FEMU with NVME_SEMANTIC_DATA_MUTATOR=1; ran 4 concurrent dd processes plus 500 rescan_controller cycles with no WARN, BUG, or Oops. The EXT_LBAS + ms!=0 + !PI combination was not triggered during testing (FEMU's mutator varies flbas and lbaf[0].ms independently; flbas=0x10 with lbaf_idx=0 was not produced in this run). The bi->metadata_size assignment path was not exercised in testing; correctness of blk_validate_integrity_limits() for this configuration was verified by code inspection. Provided as RFC. Found by FuzzNvme(Syzkaller with FEMU fuzzing framework). Acked-by: Sungwoo Kim <[email protected]> Acked-by: Dave Tian <[email protected]> Acked-by: Weidong Zhu <[email protected]> Signed-off-by: Chao Shi <[email protected]>
1 parent 42081e3 commit 1f3fcb5

1 file changed

Lines changed: 23 additions & 2 deletions

File tree

drivers/nvme/host/core.c

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1836,8 +1836,29 @@ static bool nvme_init_integrity(struct nvme_ns_head *head,
18361836
* insert/strip it, which is not possible for other kinds of metadata.
18371837
*/
18381838
if (!IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) ||
1839-
!(head->features & NVME_NS_METADATA_SUPPORTED))
1840-
return nvme_ns_has_pi(head);
1839+
!(head->features & NVME_NS_METADATA_SUPPORTED)) {
1840+
bool has_pi = nvme_ns_has_pi(head);
1841+
1842+
/*
1843+
* For PCIe EXT_LBAS non-PI namespaces the block layer sets
1844+
* capacity to 0 (we return false) to prevent block I/O, but a
1845+
* cached-rq bio may bypass bio_queue_enter freeze serialisation
1846+
* and reach nvme_setup_rw() with head->ms != 0 and no
1847+
* REQ_INTEGRITY set. Populate bi->metadata_size so that
1848+
* bio_integrity_action() returns non-zero and bio_integrity_prep()
1849+
* sets REQ_INTEGRITY on any such bio, preventing the WARN_ON_ONCE
1850+
* at nvme_setup_rw() (addressed by patch 1/2).
1851+
*
1852+
* NOTE: only metadata_size is populated; no csum or PI profile is
1853+
* configured. Actual data integrity for EXT_LBAS non-PI workloads
1854+
* is untested; this patch is RFC for direction discussion.
1855+
*/
1856+
if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
1857+
(head->features & NVME_NS_EXT_LBAS) &&
1858+
head->ms && !has_pi)
1859+
bi->metadata_size = head->ms;
1860+
return has_pi;
1861+
}
18411862

18421863
switch (head->pi_type) {
18431864
case NVME_NS_DPS_PI_TYPE3:

0 commit comments

Comments
 (0)