Skip to content

Commit 2fa9c93

Browse files
ps-ushankaraxboe
authored andcommitted
ublk: speed up ublk server exit handling
Recently, we've observed a few cases where a ublk server is able to complete restart more quickly than the driver can process the exit of the previous ublk server. The new ublk server comes up, attempts recovery of the preexisting ublk devices, and observes them still in state UBLK_S_DEV_LIVE. While this is possible due to the asynchronous nature of io_uring cleanup and should therefore be handled properly in the ublk server, it is still preferable to make ublk server exit handling faster if possible, as we should strive for it to not be a limiting factor in how fast a ublk server can restart and provide service again. Analysis of the issue showed that the vast majority of the time spent in handling the ublk server exit was in calls to blk_mq_quiesce_queue, which is essentially just a (relatively expensive) call to synchronize_rcu. The ublk server exit path currently issues an unnecessarily large number of calls to blk_mq_quiesce_queue, for two reasons: 1. It tries to call blk_mq_quiesce_queue once per ublk_queue. However, blk_mq_quiesce_queue targets the request_queue of the underlying ublk device, of which there is only one. So the number of calls is larger than necessary by a factor of nr_hw_queues. 2. In practice, it calls blk_mq_quiesce_queue _more_ than once per ublk_queue. This is because of a data race where we read ubq->canceling without any locking when deciding if we should call ublk_start_cancel. It is thus possible for two calls to ublk_uring_cmd_cancel_fn against the same ublk_queue to both call ublk_start_cancel against the same ublk_queue. Fix this by making the "canceling" flag a per-device state. This actually matches the existing code better, as there are several places where the flag is set or cleared for all queues simultaneously, and there is the general expectation that cancellation corresponds with ublk server exit. This per-device canceling flag is then checked under a (new) lock (addressing the data race (2) above), and the queue is only quiesced if it is cleared (addressing (1) above). The result is just one call to blk_mq_quiesce_queue per ublk device. To minimize the number of cache lines that are accessed in the hot path, the per-queue canceling flag is kept. The values of the per-device canceling flag and all per-queue canceling flags should always match. In our setup, where one ublk server handles I/O for 128 ublk devices, each having 24 hardware queues of depth 4096, here are the results before and after this patch, where teardown time is measured from the first call to io_ring_ctx_wait_and_kill to the return from the last ublk_ch_release: before after number of calls to blk_mq_quiesce_queue: 6469 256 teardown time: 11.14s 2.44s There are still some potential optimizations here, but this takes care of a big chunk of the ublk server exit handling delay. Signed-off-by: Uday Shankar <[email protected]> Reviewed-by: Caleb Sander Mateos <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
1 parent e74a1c6 commit 2fa9c93

1 file changed

Lines changed: 21 additions & 15 deletions

File tree

drivers/block/ublk_drv.c

Lines changed: 21 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,8 @@ struct ublk_device {
235235
struct completion completion;
236236
unsigned int nr_queues_ready;
237237
unsigned int nr_privileged_daemon;
238+
struct mutex cancel_mutex;
239+
bool canceling;
238240
};
239241

240242
/* header of ublk_params */
@@ -1589,6 +1591,7 @@ static int ublk_ch_release(struct inode *inode, struct file *filp)
15891591
* All requests may be inflight, so ->canceling may not be set, set
15901592
* it now.
15911593
*/
1594+
ub->canceling = true;
15921595
for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
15931596
struct ublk_queue *ubq = ublk_get_queue(ub, i);
15941597

@@ -1717,23 +1720,18 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
17171720
}
17181721
}
17191722

1720-
/* Must be called when queue is frozen */
1721-
static void ublk_mark_queue_canceling(struct ublk_queue *ubq)
1722-
{
1723-
spin_lock(&ubq->cancel_lock);
1724-
if (!ubq->canceling)
1725-
ubq->canceling = true;
1726-
spin_unlock(&ubq->cancel_lock);
1727-
}
1728-
1729-
static void ublk_start_cancel(struct ublk_queue *ubq)
1723+
static void ublk_start_cancel(struct ublk_device *ub)
17301724
{
1731-
struct ublk_device *ub = ubq->dev;
17321725
struct gendisk *disk = ublk_get_disk(ub);
1726+
int i;
17331727

17341728
/* Our disk has been dead */
17351729
if (!disk)
17361730
return;
1731+
1732+
mutex_lock(&ub->cancel_mutex);
1733+
if (ub->canceling)
1734+
goto out;
17371735
/*
17381736
* Now we are serialized with ublk_queue_rq()
17391737
*
@@ -1742,8 +1740,12 @@ static void ublk_start_cancel(struct ublk_queue *ubq)
17421740
* touch completed uring_cmd
17431741
*/
17441742
blk_mq_quiesce_queue(disk->queue);
1745-
ublk_mark_queue_canceling(ubq);
1743+
ub->canceling = true;
1744+
for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
1745+
ublk_get_queue(ub, i)->canceling = true;
17461746
blk_mq_unquiesce_queue(disk->queue);
1747+
out:
1748+
mutex_unlock(&ub->cancel_mutex);
17471749
ublk_put_disk(disk);
17481750
}
17491751

@@ -1816,8 +1818,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
18161818
if (WARN_ON_ONCE(task && task != io->task))
18171819
return;
18181820

1819-
if (!ubq->canceling)
1820-
ublk_start_cancel(ubq);
1821+
ublk_start_cancel(ubq->dev);
18211822

18221823
WARN_ON_ONCE(io->cmd != cmd);
18231824
ublk_cancel_cmd(ubq, pdu->tag, issue_flags);
@@ -1944,6 +1945,7 @@ static void ublk_reset_io_flags(struct ublk_device *ub)
19441945
ubq->canceling = false;
19451946
ubq->fail_io = false;
19461947
}
1948+
ub->canceling = false;
19471949
}
19481950

19491951
/* device can only be started after all IOs are ready */
@@ -2652,6 +2654,7 @@ static void ublk_cdev_rel(struct device *dev)
26522654
ublk_deinit_queues(ub);
26532655
ublk_free_dev_number(ub);
26542656
mutex_destroy(&ub->mutex);
2657+
mutex_destroy(&ub->cancel_mutex);
26552658
kfree(ub);
26562659
}
26572660

@@ -3004,6 +3007,7 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
30043007
goto out_unlock;
30053008
mutex_init(&ub->mutex);
30063009
spin_lock_init(&ub->lock);
3010+
mutex_init(&ub->cancel_mutex);
30073011

30083012
ret = ublk_alloc_dev_number(ub, header->dev_id);
30093013
if (ret < 0)
@@ -3075,6 +3079,7 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
30753079
ublk_free_dev_number(ub);
30763080
out_free_ub:
30773081
mutex_destroy(&ub->mutex);
3082+
mutex_destroy(&ub->cancel_mutex);
30783083
kfree(ub);
30793084
out_unlock:
30803085
mutex_unlock(&ublk_ctl_mutex);
@@ -3429,8 +3434,9 @@ static int ublk_ctrl_quiesce_dev(struct ublk_device *ub,
34293434
if (ub->dev_info.state != UBLK_S_DEV_LIVE)
34303435
goto put_disk;
34313436

3432-
/* Mark all queues as canceling */
3437+
/* Mark the device as canceling */
34333438
blk_mq_quiesce_queue(disk->queue);
3439+
ub->canceling = true;
34343440
for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
34353441
struct ublk_queue *ubq = ublk_get_queue(ub, i);
34363442

0 commit comments

Comments
 (0)