Skip to content

nvme: update device error handling for list-subsys#2827

Closed
martin-gpy wants to merge 1 commit intolinux-nvme:masterfrom
martin-gpy:subsys_err_handling
Closed

nvme: update device error handling for list-subsys#2827
martin-gpy wants to merge 1 commit intolinux-nvme:masterfrom
martin-gpy:subsys_err_handling

Conversation

@martin-gpy
Copy link
Copy Markdown
Contributor

The list-subsys command does not properly check if the nvme device passed to the option is a valid device or not. Fix the same.

Comment thread nvme.c
if (sscanf(devname, "nvme%dn%d", &subsys_num, &nsid) != 2) {
nvme_show_error("Invalid device name %s", devname);
return -EINVAL;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, devname a filter name, why should we verify if it is valid?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem comes when one passes an invalid device here and list-subsys seems to mess that up. For e.g. suppose one has a single subsystem with two namespaces in it:

# nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme1n1          /dev/ng1n1            81CYrBSoKGRJAAAAAAAN NetApp ONTAP Controller                  0x2        219.47  MB /  10.74  GB    512   B +  0 B   9.18.1
/dev/nvme1n2          /dev/ng1n2            81CYrBSoKGRJAAAAAAAN NetApp ONTAP Controller                  0x1        330.56  MB /  10.74  GB      4 KiB +  0 B   9.18.1
# nvme list-subsys /dev/nvme1n1
nvme-subsys1 - NQN=nqn.1992-08.com.netapp:sn.48391d66c0a611ecaaa5d039ea165514:subsystem.subsys_CLIENT116_0
               hostnqn=nqn.2014-08.org.nvmexpress:uuid:e6550026-173e-4959-ba74-be367844bd8a
\
 +- nvme1 tcp traddr=192.168.2.116,trsvcid=4420,host_traddr=192.168.1.6,src_addr=192.168.1.6 live optimized
 +- nvme2 tcp traddr=192.168.1.116,trsvcid=4420,host_traddr=192.168.1.6,src_addr=192.168.1.6 live optimized
 +- nvme3 tcp traddr=192.168.2.117,trsvcid=4420,host_traddr=192.168.1.6,src_addr=192.168.1.6 live non-optimized
 +- nvme4 tcp traddr=192.168.1.117,trsvcid=4420,host_traddr=192.168.1.6,src_addr=192.168.1.6 live non-optimized

But I also see a similar list-subsys output even for an invalid device like say nvme1n3, when ideally it should have errored out with an invalid device output:

# nvme list-subsys /dev/nvme1n3
nvme-subsys1 - NQN=nqn.1992-08.com.netapp:sn.48391d66c0a611ecaaa5d039ea165514:subsystem.subsys_CLIENT116_0
               hostnqn=nqn.2014-08.org.nvmexpress:uuid:e6550026-173e-4959-ba74-be367844bd8a
\
 +- nvme1 tcp traddr=192.168.2.116,trsvcid=4420,host_traddr=192.168.1.6,src_addr=192.168.1.6 live
 +- nvme2 tcp traddr=192.168.1.116,trsvcid=4420,host_traddr=192.168.1.6,src_addr=192.168.1.6 live
 +- nvme3 tcp traddr=192.168.2.117,trsvcid=4420,host_traddr=192.168.1.6,src_addr=192.168.1.6 live
 +- nvme4 tcp traddr=192.168.1.117,trsvcid=4420,host_traddr=192.168.1.6,src_addr=192.168.1.6 live

And I see a blank output when I pass a different invalid device with a new subsys number. For e.g.

# nvme list-subsys /dev/nvme2n1
#
# echo $?
0

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But one problem I see with the patch above is that nsid does not get populated for a valid device. Need to fix that up.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then the filter is broken and this is just a workaround.

The list-subsys command fails to error out if one passes an invalid
or non-existent ns device as part of the <device> option. One would
have ideally expected the nvme_match_device_filter() to catch this,
but such cases would be better handled with a simple stat check for
ns device validity before proceeding to the filter in the first place.
That way, the stat check can take care of ns device validity whereas
the filter can focus on filtering the subsystems and controllers,
ignoring checking for invalid ns devices, as logically expected in a
list subsystems command.

Signed-off-by: Martin George <[email protected]>
@martin-gpy martin-gpy force-pushed the subsys_err_handling branch from 23e512a to 31b3f02 Compare May 28, 2025 18:49
@igaw
Copy link
Copy Markdown
Collaborator

igaw commented Jun 2, 2025

Hmm, still not explaining why the filter doesn't work. I'll look into it when I find some cycles to work on.

@igaw
Copy link
Copy Markdown
Collaborator

igaw commented Jul 1, 2025

Merged PR #2836 which addresses this issue.

@igaw igaw closed this Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants