Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions tests/nvme/068
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-3.0+
# Copyright (C) 2025 Yi Zhang <[email protected]>
#
# Test NVMe subsystem-reset command
#
# Regression test for below two commits:
# 210b1f6576e8 nvme-pci: do not directly handle subsys reset fallout
# 0edb475ac0a7 nvme: fix PCIe subsystem reset controller state transition

. tests/nvme/rc

DESCRIPTION="Test NVMe subsystem-reset command"

requires() {
_nvme_requires
_have_program nvme
}

device_requires() {
_require_test_dev_is_nvme
_require_test_dev_support_subsystem_reset
}

test_device() {
echo "Running ${TEST_NAME}"

local ctrl_dev

ctrl_dev=${TEST_DEV%n*}

# Start nvme subsystem-reset operation
if ! nvme subsystem-reset "$ctrl_dev" >> "$FULL" 2>&1; then
echo "ERROR: subsystem-reset failed"
fi

# Wait NVMe disk reinitialized
sleep 10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least on PPC platform, adding delay of sleep 10 would not work. As we know, running subsystem-reset command on PPC platform would cause the communication loss to the NVMe adapter. So on PPC system any I/Os those were running while susbsytem-reset is executed or any new I/O submitted after the subsystem-reset is executed would eventually times out after 30 seconds. The nvme timeout handler code would then attempt to read PCIe/MMIO config space register which triggers the EEH and then EEH would recover the communication link to the NVMe adapter. So in theory, in worst case, it would take more than 30 seconds for the link to be restored and device to be back online post subsystem-reset.

I'd suggest following steps:

  1. Start I/O (maybe using fio)
  2. execute nvme subsystem-reset
  3. Add sleep 35 (30 seconds for I/O request timeout plus additional 5 seconds as cushion)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better you may first retrieve timeout value from, /sys/block/<blk-dev>/queue/io_timeout (which is in ms) and then accordingly adjust sleep.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to set the default timeout to something short like 5s to avoid tests just waiting for timeouts.


if [ ! -b "$TEST_DEV" ]; then
echo "ERROR: $TEST_DEV still not reinitialized after 10 seconds!"
fi

# Start dd write/read operation to check the NVMe disk works as expected
dd if=/dev/urandom of="$TEST_DEV" count=1024 bs=1M status=none
dd if="$TEST_DEV" of=/dev/null count=1024 bs=1M status=none

echo "Test complete"
}
2 changes: 2 additions & 0 deletions tests/nvme/068.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Running nvme/068
Test complete
8 changes: 8 additions & 0 deletions tests/nvme/rc
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,14 @@ _require_test_dev_support_sed() {
return 1
}

_require_test_dev_support_subsystem_reset() {
if ! nvme show-regs "$TEST_DEV" -H | grep -q "NSSRS.*Yes"; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on the already closed PR. :)

nvme show-regs is likely not to work: https://github.com/linux-nvme/nvme-cli/wiki/FAQ#nvme-show-regs-devnvme0-returns-nvme0-failed-to-map

I think this is something the kernel needs to expose via the sysfs interface.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just tried one kernel with CONFIG_IO_STRICT_DEVMEM enabled, and the cmd failed.

# nvme show-regs /dev/nvme0
NVMe status: Invalid Command Opcode: A reserved coded value or an unsupported value in the command opcode field(0x4001)
# cat /boot/config-7.0.0-0.rc2.21.fc45.x86_64 | grep IO_STRI
CONFIG_IO_STRICT_DEVMEM=y

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we could do is add a sysfs entry which exposes all the registers and nvme show-regs uses these when evailable otherwise it falls back to the raw memory access.

SKIP_REASONS+=("$TEST_DEV doesn't support subsystem-reset operation")
return 1
fi
return 0
}

_require_nvme_cli_auth() {
if ! nvme gen-dhchap-key --nqn nvmf-test-subsys > /dev/null 2>&1 ; then
SKIP_REASONS+=("nvme gen-dhchap-key command missing")
Expand Down
Loading