-
Notifications
You must be signed in to change notification settings - Fork 92
nvme/068: add new test for nvme subsystem-reset test #229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| #!/bin/bash | ||
| # SPDX-License-Identifier: GPL-3.0+ | ||
| # Copyright (C) 2025 Yi Zhang <[email protected]> | ||
| # | ||
| # Test NVMe subsystem-reset command | ||
| # | ||
| # Regression test for below two commits: | ||
| # 210b1f6576e8 nvme-pci: do not directly handle subsys reset fallout | ||
| # 0edb475ac0a7 nvme: fix PCIe subsystem reset controller state transition | ||
|
|
||
| . tests/nvme/rc | ||
|
|
||
| DESCRIPTION="Test NVMe subsystem-reset command" | ||
|
|
||
| requires() { | ||
| _nvme_requires | ||
| _have_program nvme | ||
| } | ||
|
|
||
| device_requires() { | ||
| _require_test_dev_is_nvme | ||
| _require_test_dev_support_subsystem_reset | ||
| } | ||
|
|
||
| test_device() { | ||
| echo "Running ${TEST_NAME}" | ||
|
|
||
| local ctrl_dev | ||
|
|
||
| ctrl_dev=${TEST_DEV%n*} | ||
|
|
||
| # Start nvme subsystem-reset operation | ||
| if ! nvme subsystem-reset "$ctrl_dev" >> "$FULL" 2>&1; then | ||
| echo "ERROR: subsystem-reset failed" | ||
| fi | ||
|
|
||
| # Wait NVMe disk reinitialized | ||
| sleep 10 | ||
|
|
||
| if [ ! -b "$TEST_DEV" ]; then | ||
| echo "ERROR: $TEST_DEV still not reinitialized after 10 seconds!" | ||
| fi | ||
|
|
||
| # Start dd write/read operation to check the NVMe disk works as expected | ||
| dd if=/dev/urandom of="$TEST_DEV" count=1024 bs=1M status=none | ||
| dd if="$TEST_DEV" of=/dev/null count=1024 bs=1M status=none | ||
|
|
||
| echo "Test complete" | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| Running nvme/068 | ||
| Test complete |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -99,6 +99,14 @@ _require_test_dev_support_sed() { | |
| return 1 | ||
| } | ||
|
|
||
| _require_test_dev_support_subsystem_reset() { | ||
| if ! nvme show-regs "$TEST_DEV" -H | grep -q "NSSRS.*Yes"; then | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment on the already closed PR. :)
I think this is something the kernel needs to expose via the sysfs interface.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, just tried one kernel with CONFIG_IO_STRICT_DEVMEM enabled, and the cmd failed.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What we could do is add a sysfs entry which exposes all the registers and |
||
| SKIP_REASONS+=("$TEST_DEV doesn't support subsystem-reset operation") | ||
| return 1 | ||
| fi | ||
| return 0 | ||
| } | ||
|
|
||
| _require_nvme_cli_auth() { | ||
| if ! nvme gen-dhchap-key --nqn nvmf-test-subsys > /dev/null 2>&1 ; then | ||
| SKIP_REASONS+=("nvme gen-dhchap-key command missing") | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least on PPC platform, adding delay of sleep 10 would not work. As we know, running subsystem-reset command on PPC platform would cause the communication loss to the NVMe adapter. So on PPC system any I/Os those were running while susbsytem-reset is executed or any new I/O submitted after the subsystem-reset is executed would eventually times out after 30 seconds. The nvme timeout handler code would then attempt to read PCIe/MMIO config space register which triggers the EEH and then EEH would recover the communication link to the NVMe adapter. So in theory, in worst case, it would take more than 30 seconds for the link to be restored and device to be back online post subsystem-reset.
I'd suggest following steps:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better you may first retrieve timeout value from, /sys/block/<blk-dev>/queue/io_timeout (which is in ms) and then accordingly adjust sleep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to set the default timeout to something short like 5s to avoid tests just waiting for timeouts.