fabrics: retry connect command on EINTR#987
Conversation
| long int __result = 0; \ | ||
| do { \ | ||
| __result = (long int)(exp); \ | ||
| } while ((__result == -1) && (errno == EINTR)); \ |
There was a problem hiding this comment.
Could we be stuck here for a very long time?
EINTR happens when a signal is received. For example, if someone types CTRL-C (SIGINT) to stop a process that seems stuck. With this, CTRL-C will have no effect. Just wondering if that's what we want.
There was a problem hiding this comment.
Yes, this is a problem. Though I am not sure what /dev/nvme-fabrics does on SIGINT.
There was a problem hiding this comment.
Yes, this is a problem. Though I am not sure what /dev/nvme-fabrics does on SIGINT.
Right! I remember being stuck on /dev/nvme-fabrics before and there is no way to get out. The kernel just doesn't allow interrupting operations on that device. So, maybe my comment does not apply here.
There was a problem hiding this comment.
And when nvme request are cancelled the syscall will also return EINTR.
So there are at least three sources/reasons for an EINTR. Don't know if we can handle this correctly in userspace.
There was a problem hiding this comment.
Yes, this is a problem. Though I am not sure what /dev/nvme-fabrics does on SIGINT.
Right! I remember being stuck on
/dev/nvme-fabricsbefore and there is no way to get out. The kernel just doesn't allow interrupting operations on that device. So, maybe my comment does not apply here.
Indeed, though I think it should be possible to handle SIGINTs, it's just not there yet.
There was a problem hiding this comment.
After a bit of refreshing my memory on signals, I don't think there is a real problem. The ctrl-c is translated to SIGINT which is then delivered to the process. The default handler for SIGINT is to terminate the process. That means when the syscall returns the default handler gets executed (terminate process) and thus the loop will not be run anymore.
There was a problem hiding this comment.
Removed the TFR macro from the open call, as this call shouldn't be a blocking call.
Recent changes in the nvme-tcp transport started to return EINTR due to a signal while reading from a socket. The transport could retry the operation but so far this hasn't been implemented. Thus we retry in userpace the connect call when EINTR is returned. Signed-off-by: Daniel Wagner <[email protected]>
|
FTR, I've tested this by adding a EINTR return into to the admin queue connect code path which fired every other time. Userspace saw the EINTR and retried. So that part seems to work as expected. |
When the system call fails with EINTR, retry the operation.
Fixes: #linux-nvme/nvme-cli#2760