Skip to content

Commit b801ed0

Browse files
committed
Merge branch 'block-6.16' into for-next
* block-6.16: selftests: ublk: cover PER_IO_DAEMON in more stress tests Documentation: ublk: document UBLK_F_PER_IO_DAEMON selftests: ublk: add stress test for per io daemons selftests: ublk: add functional test for per io daemons selftests: ublk: kublk: decouple ublk_queues from ublk server threads selftests: ublk: kublk: move per-thread data out of ublk_queue selftests: ublk: kublk: lift queue initialization out of thread selftests: ublk: kublk: tie sqe allocation to io instead of queue selftests: ublk: kublk: plumb q_id in io_uring user_data ublk: have a per-io daemon instead of a per-queue daemon md/md-bitmap: remove parameter slot from bitmap_create() md/md-bitmap: cleanup bitmap_ops->startwrite() md/dm-raid: remove max_write_behind setting limit md/md-bitmap: fix dm-raid max_write_behind setting md/raid1,raid10: don't handle IO error for REQ_RAHEAD and REQ_NOWAIT loop: add file_start_write() and file_end_write() bcache: reserve more RESERVE_BTREE buckets to prevent allocator hang bcache: remove unused constants bcache: fix NULL pointer in cache_set_flush()
2 parents 1672aaf + da12597 commit b801ed0

26 files changed

Lines changed: 609 additions & 297 deletions

Documentation/block/ublk.rst

Lines changed: 24 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -115,15 +115,15 @@ managing and controlling ublk devices with help of several control commands:
115115

116116
- ``UBLK_CMD_START_DEV``
117117

118-
After the server prepares userspace resources (such as creating per-queue
119-
pthread & io_uring for handling ublk IO), this command is sent to the
118+
After the server prepares userspace resources (such as creating I/O handler
119+
threads & io_uring for handling ublk IO), this command is sent to the
120120
driver for allocating & exposing ``/dev/ublkb*``. Parameters set via
121121
``UBLK_CMD_SET_PARAMS`` are applied for creating the device.
122122

123123
- ``UBLK_CMD_STOP_DEV``
124124

125125
Halt IO on ``/dev/ublkb*`` and remove the device. When this command returns,
126-
ublk server will release resources (such as destroying per-queue pthread &
126+
ublk server will release resources (such as destroying I/O handler threads &
127127
io_uring).
128128

129129
- ``UBLK_CMD_DEL_DEV``
@@ -208,15 +208,15 @@ managing and controlling ublk devices with help of several control commands:
208208
modify how I/O is handled while the ublk server is dying/dead (this is called
209209
the ``nosrv`` case in the driver code).
210210

211-
With just ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
212-
handler) is dying, ublk does not delete ``/dev/ublkb*`` during the whole
211+
With just ``UBLK_F_USER_RECOVERY`` set, after the ublk server exits,
212+
ublk does not delete ``/dev/ublkb*`` during the whole
213213
recovery stage and ublk device ID is kept. It is ublk server's
214214
responsibility to recover the device context by its own knowledge.
215215
Requests which have not been issued to userspace are requeued. Requests
216216
which have been issued to userspace are aborted.
217217

218-
With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after one ubq_daemon
219-
(ublk server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
218+
With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after the ublk server
219+
exits, contrary to ``UBLK_F_USER_RECOVERY``,
220220
requests which have been issued to userspace are requeued and will be
221221
re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``.
222222
``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate
@@ -241,10 +241,11 @@ can be controlled/accessed just inside this container.
241241
Data plane
242242
----------
243243

244-
ublk server needs to create per-queue IO pthread & io_uring for handling IO
245-
commands via io_uring passthrough. The per-queue IO pthread
246-
focuses on IO handling and shouldn't handle any control & management
247-
tasks.
244+
The ublk server should create dedicated threads for handling I/O. Each
245+
thread should have its own io_uring through which it is notified of new
246+
I/O, and through which it can complete I/O. These dedicated threads
247+
should focus on IO handling and shouldn't handle any control &
248+
management tasks.
248249

249250
The's IO is assigned by a unique tag, which is 1:1 mapping with IO
250251
request of ``/dev/ublkb*``.
@@ -265,6 +266,18 @@ with specified IO tag in the command data:
265266
destined to ``/dev/ublkb*``. This command is sent only once from the server
266267
IO pthread for ublk driver to setup IO forward environment.
267268

269+
Once a thread issues this command against a given (qid,tag) pair, the thread
270+
registers itself as that I/O's daemon. In the future, only that I/O's daemon
271+
is allowed to issue commands against the I/O. If any other thread attempts
272+
to issue a command against a (qid,tag) pair for which the thread is not the
273+
daemon, the command will fail. Daemons can be reset only be going through
274+
recovery.
275+
276+
The ability for every (qid,tag) pair to have its own independent daemon task
277+
is indicated by the ``UBLK_F_PER_IO_DAEMON`` feature. If this feature is not
278+
supported by the driver, daemons must be per-queue instead - i.e. all I/Os
279+
associated to a single qid must be handled by the same task.
280+
268281
- ``UBLK_IO_COMMIT_AND_FETCH_REQ``
269282

270283
When an IO request is destined to ``/dev/ublkb*``, the driver stores

drivers/block/loop.c

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -308,11 +308,14 @@ static void lo_complete_rq(struct request *rq)
308308
static void lo_rw_aio_do_completion(struct loop_cmd *cmd)
309309
{
310310
struct request *rq = blk_mq_rq_from_pdu(cmd);
311+
struct loop_device *lo = rq->q->queuedata;
311312

312313
if (!atomic_dec_and_test(&cmd->ref))
313314
return;
314315
kfree(cmd->bvec);
315316
cmd->bvec = NULL;
317+
if (req_op(rq) == REQ_OP_WRITE)
318+
file_end_write(lo->lo_backing_file);
316319
if (likely(!blk_should_fake_timeout(rq->q)))
317320
blk_mq_complete_request(rq);
318321
}
@@ -387,9 +390,10 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
387390
cmd->iocb.ki_flags = 0;
388391
}
389392

390-
if (rw == ITER_SOURCE)
393+
if (rw == ITER_SOURCE) {
394+
file_start_write(lo->lo_backing_file);
391395
ret = file->f_op->write_iter(&cmd->iocb, &iter);
392-
else
396+
} else
393397
ret = file->f_op->read_iter(&cmd->iocb, &iter);
394398

395399
lo_rw_aio_do_completion(cmd);

drivers/block/ublk_drv.c

Lines changed: 56 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,8 @@
6969
| UBLK_F_USER_RECOVERY_FAIL_IO \
7070
| UBLK_F_UPDATE_SIZE \
7171
| UBLK_F_AUTO_BUF_REG \
72-
| UBLK_F_QUIESCE)
72+
| UBLK_F_QUIESCE \
73+
| UBLK_F_PER_IO_DAEMON)
7374

7475
#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
7576
| UBLK_F_USER_RECOVERY_REISSUE \
@@ -166,18 +167,18 @@ struct ublk_io {
166167
/* valid if UBLK_IO_FLAG_OWNED_BY_SRV is set */
167168
struct request *req;
168169
};
170+
171+
struct task_struct *task;
169172
};
170173

171174
struct ublk_queue {
172175
int q_id;
173176
int q_depth;
174177

175178
unsigned long flags;
176-
struct task_struct *ubq_daemon;
177179
struct ublksrv_io_desc *io_cmd_buf;
178180

179181
bool force_abort;
180-
bool timeout;
181182
bool canceling;
182183
bool fail_io; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */
183184
unsigned short nr_io_ready; /* how many ios setup */
@@ -1099,11 +1100,6 @@ static inline struct ublk_uring_cmd_pdu *ublk_get_uring_cmd_pdu(
10991100
return io_uring_cmd_to_pdu(ioucmd, struct ublk_uring_cmd_pdu);
11001101
}
11011102

1102-
static inline bool ubq_daemon_is_dying(struct ublk_queue *ubq)
1103-
{
1104-
return !ubq->ubq_daemon || ubq->ubq_daemon->flags & PF_EXITING;
1105-
}
1106-
11071103
/* todo: handle partial completion */
11081104
static inline void __ublk_complete_rq(struct request *req)
11091105
{
@@ -1275,13 +1271,13 @@ static void ublk_dispatch_req(struct ublk_queue *ubq,
12751271
/*
12761272
* Task is exiting if either:
12771273
*
1278-
* (1) current != ubq_daemon.
1274+
* (1) current != io->task.
12791275
* io_uring_cmd_complete_in_task() tries to run task_work
1280-
* in a workqueue if ubq_daemon(cmd's task) is PF_EXITING.
1276+
* in a workqueue if cmd's task is PF_EXITING.
12811277
*
12821278
* (2) current->flags & PF_EXITING.
12831279
*/
1284-
if (unlikely(current != ubq->ubq_daemon || current->flags & PF_EXITING)) {
1280+
if (unlikely(current != io->task || current->flags & PF_EXITING)) {
12851281
__ublk_abort_rq(ubq, req);
12861282
return;
12871283
}
@@ -1330,38 +1326,33 @@ static void ublk_cmd_list_tw_cb(struct io_uring_cmd *cmd,
13301326
{
13311327
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
13321328
struct request *rq = pdu->req_list;
1333-
struct ublk_queue *ubq = pdu->ubq;
13341329
struct request *next;
13351330

13361331
do {
13371332
next = rq->rq_next;
13381333
rq->rq_next = NULL;
1339-
ublk_dispatch_req(ubq, rq, issue_flags);
1334+
ublk_dispatch_req(rq->mq_hctx->driver_data, rq, issue_flags);
13401335
rq = next;
13411336
} while (rq);
13421337
}
13431338

1344-
static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l)
1339+
static void ublk_queue_cmd_list(struct ublk_io *io, struct rq_list *l)
13451340
{
1346-
struct request *rq = rq_list_peek(l);
1347-
struct io_uring_cmd *cmd = ubq->ios[rq->tag].cmd;
1341+
struct io_uring_cmd *cmd = io->cmd;
13481342
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
13491343

1350-
pdu->req_list = rq;
1344+
pdu->req_list = rq_list_peek(l);
13511345
rq_list_init(l);
13521346
io_uring_cmd_complete_in_task(cmd, ublk_cmd_list_tw_cb);
13531347
}
13541348

13551349
static enum blk_eh_timer_return ublk_timeout(struct request *rq)
13561350
{
13571351
struct ublk_queue *ubq = rq->mq_hctx->driver_data;
1352+
struct ublk_io *io = &ubq->ios[rq->tag];
13581353

13591354
if (ubq->flags & UBLK_F_UNPRIVILEGED_DEV) {
1360-
if (!ubq->timeout) {
1361-
send_sig(SIGKILL, ubq->ubq_daemon, 0);
1362-
ubq->timeout = true;
1363-
}
1364-
1355+
send_sig(SIGKILL, io->task, 0);
13651356
return BLK_EH_DONE;
13661357
}
13671358

@@ -1429,24 +1420,25 @@ static void ublk_queue_rqs(struct rq_list *rqlist)
14291420
{
14301421
struct rq_list requeue_list = { };
14311422
struct rq_list submit_list = { };
1432-
struct ublk_queue *ubq = NULL;
1423+
struct ublk_io *io = NULL;
14331424
struct request *req;
14341425

14351426
while ((req = rq_list_pop(rqlist))) {
14361427
struct ublk_queue *this_q = req->mq_hctx->driver_data;
1428+
struct ublk_io *this_io = &this_q->ios[req->tag];
14371429

1438-
if (ubq && ubq != this_q && !rq_list_empty(&submit_list))
1439-
ublk_queue_cmd_list(ubq, &submit_list);
1440-
ubq = this_q;
1430+
if (io && io->task != this_io->task && !rq_list_empty(&submit_list))
1431+
ublk_queue_cmd_list(io, &submit_list);
1432+
io = this_io;
14411433

1442-
if (ublk_prep_req(ubq, req, true) == BLK_STS_OK)
1434+
if (ublk_prep_req(this_q, req, true) == BLK_STS_OK)
14431435
rq_list_add_tail(&submit_list, req);
14441436
else
14451437
rq_list_add_tail(&requeue_list, req);
14461438
}
14471439

1448-
if (ubq && !rq_list_empty(&submit_list))
1449-
ublk_queue_cmd_list(ubq, &submit_list);
1440+
if (!rq_list_empty(&submit_list))
1441+
ublk_queue_cmd_list(io, &submit_list);
14501442
*rqlist = requeue_list;
14511443
}
14521444

@@ -1474,17 +1466,6 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
14741466
/* All old ioucmds have to be completed */
14751467
ubq->nr_io_ready = 0;
14761468

1477-
/*
1478-
* old daemon is PF_EXITING, put it now
1479-
*
1480-
* It could be NULL in case of closing one quisced device.
1481-
*/
1482-
if (ubq->ubq_daemon)
1483-
put_task_struct(ubq->ubq_daemon);
1484-
/* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */
1485-
ubq->ubq_daemon = NULL;
1486-
ubq->timeout = false;
1487-
14881469
for (i = 0; i < ubq->q_depth; i++) {
14891470
struct ublk_io *io = &ubq->ios[i];
14901471

@@ -1495,6 +1476,17 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
14951476
io->flags &= UBLK_IO_FLAG_CANCELED;
14961477
io->cmd = NULL;
14971478
io->addr = 0;
1479+
1480+
/*
1481+
* old task is PF_EXITING, put it now
1482+
*
1483+
* It could be NULL in case of closing one quiesced
1484+
* device.
1485+
*/
1486+
if (io->task) {
1487+
put_task_struct(io->task);
1488+
io->task = NULL;
1489+
}
14981490
}
14991491
}
15001492

@@ -1516,7 +1508,7 @@ static void ublk_reset_ch_dev(struct ublk_device *ub)
15161508
for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
15171509
ublk_queue_reinit(ub, ublk_get_queue(ub, i));
15181510

1519-
/* set to NULL, otherwise new ubq_daemon cannot mmap the io_cmd_buf */
1511+
/* set to NULL, otherwise new tasks cannot mmap io_cmd_buf */
15201512
ub->mm = NULL;
15211513
ub->nr_queues_ready = 0;
15221514
ub->nr_privileged_daemon = 0;
@@ -1783,6 +1775,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
17831775
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
17841776
struct ublk_queue *ubq = pdu->ubq;
17851777
struct task_struct *task;
1778+
struct ublk_io *io;
17861779

17871780
if (WARN_ON_ONCE(!ubq))
17881781
return;
@@ -1791,13 +1784,14 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
17911784
return;
17921785

17931786
task = io_uring_cmd_get_task(cmd);
1794-
if (WARN_ON_ONCE(task && task != ubq->ubq_daemon))
1787+
io = &ubq->ios[pdu->tag];
1788+
if (WARN_ON_ONCE(task && task != io->task))
17951789
return;
17961790

17971791
if (!ubq->canceling)
17981792
ublk_start_cancel(ubq);
17991793

1800-
WARN_ON_ONCE(ubq->ios[pdu->tag].cmd != cmd);
1794+
WARN_ON_ONCE(io->cmd != cmd);
18011795
ublk_cancel_cmd(ubq, pdu->tag, issue_flags);
18021796
}
18031797

@@ -1930,8 +1924,6 @@ static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
19301924
{
19311925
ubq->nr_io_ready++;
19321926
if (ublk_queue_ready(ubq)) {
1933-
ubq->ubq_daemon = current;
1934-
get_task_struct(ubq->ubq_daemon);
19351927
ub->nr_queues_ready++;
19361928

19371929
if (capable(CAP_SYS_ADMIN))
@@ -2084,6 +2076,7 @@ static int ublk_fetch(struct io_uring_cmd *cmd, struct ublk_queue *ubq,
20842076
}
20852077

20862078
ublk_fill_io_cmd(io, cmd, buf_addr);
2079+
WRITE_ONCE(io->task, get_task_struct(current));
20872080
ublk_mark_io_ready(ub, ubq);
20882081
out:
20892082
mutex_unlock(&ub->mutex);
@@ -2179,6 +2172,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
21792172
const struct ublksrv_io_cmd *ub_cmd)
21802173
{
21812174
struct ublk_device *ub = cmd->file->private_data;
2175+
struct task_struct *task;
21822176
struct ublk_queue *ubq;
21832177
struct ublk_io *io;
21842178
u32 cmd_op = cmd->cmd_op;
@@ -2193,13 +2187,14 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
21932187
goto out;
21942188

21952189
ubq = ublk_get_queue(ub, ub_cmd->q_id);
2196-
if (ubq->ubq_daemon && ubq->ubq_daemon != current)
2197-
goto out;
21982190

21992191
if (tag >= ubq->q_depth)
22002192
goto out;
22012193

22022194
io = &ubq->ios[tag];
2195+
task = READ_ONCE(io->task);
2196+
if (task && task != current)
2197+
goto out;
22032198

22042199
/* there is pending io cmd, something must be wrong */
22052200
if (io->flags & UBLK_IO_FLAG_ACTIVE) {
@@ -2449,9 +2444,14 @@ static void ublk_deinit_queue(struct ublk_device *ub, int q_id)
24492444
{
24502445
int size = ublk_queue_cmd_buf_size(ub, q_id);
24512446
struct ublk_queue *ubq = ublk_get_queue(ub, q_id);
2447+
int i;
2448+
2449+
for (i = 0; i < ubq->q_depth; i++) {
2450+
struct ublk_io *io = &ubq->ios[i];
2451+
if (io->task)
2452+
put_task_struct(io->task);
2453+
}
24522454

2453-
if (ubq->ubq_daemon)
2454-
put_task_struct(ubq->ubq_daemon);
24552455
if (ubq->io_cmd_buf)
24562456
free_pages((unsigned long)ubq->io_cmd_buf, get_order(size));
24572457
}
@@ -2923,7 +2923,8 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
29232923
ub->dev_info.flags &= UBLK_F_ALL;
29242924

29252925
ub->dev_info.flags |= UBLK_F_CMD_IOCTL_ENCODE |
2926-
UBLK_F_URING_CMD_COMP_IN_TASK;
2926+
UBLK_F_URING_CMD_COMP_IN_TASK |
2927+
UBLK_F_PER_IO_DAEMON;
29272928

29282929
/* GET_DATA isn't needed any more with USER_COPY or ZERO COPY */
29292930
if (ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY |
@@ -3188,14 +3189,14 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub,
31883189
int ublksrv_pid = (int)header->data[0];
31893190
int ret = -EINVAL;
31903191

3191-
pr_devel("%s: Waiting for new ubq_daemons(nr: %d) are ready, dev id %d...\n",
3192-
__func__, ub->dev_info.nr_hw_queues, header->dev_id);
3193-
/* wait until new ubq_daemon sending all FETCH_REQ */
3192+
pr_devel("%s: Waiting for all FETCH_REQs, dev id %d...\n", __func__,
3193+
header->dev_id);
3194+
31943195
if (wait_for_completion_interruptible(&ub->completion))
31953196
return -EINTR;
31963197

3197-
pr_devel("%s: All new ubq_daemons(nr: %d) are ready, dev id %d\n",
3198-
__func__, ub->dev_info.nr_hw_queues, header->dev_id);
3198+
pr_devel("%s: All FETCH_REQs received, dev id %d\n", __func__,
3199+
header->dev_id);
31993200

32003201
mutex_lock(&ub->mutex);
32013202
if (ublk_nosrv_should_stop_dev(ub))

0 commit comments

Comments
 (0)