Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
83b7880
Merge branch 'io_uring-7.0' into for-next
axboe Apr 3, 2026
b595ad7
Merge branch 'for-7.1/io_uring' into for-next
axboe Apr 3, 2026
6568edb
Merge branch 'for-7.1/block' into for-next
axboe Apr 3, 2026
5d77540
Merge branch 'for-7.1/block' into for-next
axboe Apr 3, 2026
381b604
Merge branch 'for-7.1/io_uring' into for-7.1/io_uring-fuse
axboe Apr 3, 2026
29ebfdd
io_uring/rsrc: rename io_buffer_register_bvec()/io_buffer_unregister_…
joannekoong Apr 3, 2026
3679784
io_uring/rsrc: split io_buffer_register_request() logic
joannekoong Apr 3, 2026
33ee911
io_uring/rsrc: add io_buffer_register_bvec()
joannekoong Apr 3, 2026
b09efad
io_uring/rsrc: rename and export IO_IMU_DEST / IO_IMU_SOURCE
joannekoong Apr 3, 2026
6030f93
Merge branch 'for-7.1/io_uring-fuse' into for-next
axboe Apr 3, 2026
15c4162
Merge branch 'for-7.1/block' into for-next
axboe Apr 4, 2026
09ebc43
Merge branch 'for-7.1/block' into for-next
axboe Apr 6, 2026
d436cfb
Merge branch 'for-7.1/block' into for-next
axboe Apr 6, 2026
0b581d2
Merge branch 'for-7.1/block' into for-next
axboe Apr 7, 2026
dec615f
Merge branch 'for-7.1/block' into for-next
axboe Apr 7, 2026
cc91702
Merge branch 'for-7.1/block' into for-next
axboe Apr 7, 2026
cb793ff
Merge branch 'for-7.1/block' into for-next
axboe Apr 7, 2026
485f07e
Merge branch 'for-7.1/block' into for-next
axboe Apr 8, 2026
7eb7e8a
Merge branch 'for-7.1/io_uring' into for-next
axboe Apr 8, 2026
ddc1dff
Merge branch 'for-7.1/block' into for-next
axboe Apr 10, 2026
e0b1570
Merge branch 'for-7.1/block' into for-next
axboe Apr 10, 2026
81a0a2e
Merge branch 'for-7.1/block' into for-next
axboe Apr 10, 2026
88a57e1
Merge branch 'for-7.1/block' into for-next
axboe Apr 10, 2026
829c223
Dummy commit
kawasaki Apr 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions Documentation/ABI/stable/sysfs-block
Original file line number Diff line number Diff line change
Expand Up @@ -886,6 +886,21 @@ Description:
zone commands, they will be treated as regular block devices and
zoned will report "none".

What: /sys/block/<disk>/queue/zoned_qd1_writes
Date: January 2026
Contact: Damien Le Moal <[email protected]>
Description:
[RW] zoned_qd1_writes indicates if write operations to a zoned
block device are being handled using a single issuer context (a
kernel thread) operating at a maximum queue depth of 1. This
attribute is visible only for zoned block devices. The default
value for zoned block devices that are not rotational devices
(e.g. ZNS SSDs or zoned UFS devices) is 0. For rotational zoned
block devices (e.g. SMR HDDs) the default value is 1. Since
this default may not be appropriate for some devices, e.g.
remotely connected devices over high latency networks, the user
can disable this feature by setting this attribute to 0.


What: /sys/block/<disk>/hidden
Date: March 2023
Expand Down
13 changes: 13 additions & 0 deletions Documentation/ABI/testing/sysfs-nvme
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
What: /sys/devices/virtual/nvme-fabrics/ctl/.../tls_configured_key
Date: November 2025
KernelVersion: 6.19
Contact: Linux NVMe mailing list <[email protected]>
Description:
The file is avaliable when using a secure concatanation
connection to a NVMe target. Reading the file will return
the serial of the currently negotiated key.

Writing 0 to the file will trigger a PSK reauthentication
(REPLACETLSPSK) with the target. After a reauthentication
the value returned by tls_configured_key will be the new
serial.
10 changes: 9 additions & 1 deletion Documentation/admin-guide/blockdev/zoned_loop.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ The options available for the add command can be listed by reading the
/dev/zloop-control device::

$ cat /dev/zloop-control
add id=%d,capacity_mb=%u,zone_size_mb=%u,zone_capacity_mb=%u,conv_zones=%u,base_dir=%s,nr_queues=%u,queue_depth=%u,buffered_io
add id=%d,capacity_mb=%u,zone_size_mb=%u,zone_capacity_mb=%u,conv_zones=%u,max_open_zones=%u,base_dir=%s,nr_queues=%u,queue_depth=%u,buffered_io,zone_append=%u,ordered_zone_append,discard_write_cache
remove id=%d

In more details, the options that can be used with the "add" command are as
Expand All @@ -80,6 +80,9 @@ zone_capacity_mb Device zone capacity (must always be equal to or lower
conv_zones Total number of conventioanl zones starting from
sector 0
Default: 8
max_open_zones Maximum number of open sequential write required zones
(0 for no limit).
Default: 0
base_dir Path to the base directory where to create the directory
containing the zone files of the device.
Default=/var/local/zloop.
Expand All @@ -104,6 +107,11 @@ ordered_zone_append Enable zloop mitigation of zone append reordering.
(extents), as when enabled, this can significantly reduce
the number of data extents needed to for a file data
mapping.
discard_write_cache Discard all data that was not explicitly persisted using a
flush operation when the device is removed by truncating
each zone file to the size recorded during the last flush
operation. This simulates power fail events where
uncommitted data is lost.
=================== =========================================================

3) Deleting a Zoned Device
Expand Down
2 changes: 1 addition & 1 deletion Documentation/block/inline-encryption.rst
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ blk-crypto-fallback completes the original bio. If the original bio is too
large, multiple bounce bios may be required; see the code for details.

For decryption, blk-crypto-fallback "wraps" the bio's completion callback
(``bi_complete``) and private data (``bi_private``) with its own, unsets the
(``bi_end_io``) and private data (``bi_private``) with its own, unsets the
bio's encryption context, then submits the bio. If the read completes
successfully, blk-crypto-fallback restores the bio's original completion
callback and private data, then decrypts the bio's data in-place using the
Expand Down
133 changes: 126 additions & 7 deletions Documentation/block/ublk.rst
Original file line number Diff line number Diff line change
Expand Up @@ -382,17 +382,17 @@ Zero copy
---------

ublk zero copy relies on io_uring's fixed kernel buffer, which provides
two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`.
two APIs: `io_buffer_register_request()` and `io_buffer_unregister`.

ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call
`io_buffer_register_bvec()` for ublk server to register client request
`io_buffer_register_request()` for ublk server to register client request
buffer into io_uring buffer table, then ublk server can submit io_uring
IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF`
calls `io_buffer_unregister_bvec()` to unregister the buffer, which is
guaranteed to be live between calling `io_buffer_register_bvec()` and
`io_buffer_unregister_bvec()`. Any io_uring operation which supports this
kind of kernel buffer will grab one reference of the buffer until the
operation is completed.
calls `io_buffer_unregister()` to unregister the buffer, which is guaranteed
to be live between calling `io_buffer_register_request()` and
`io_buffer_unregister()`. Any io_uring operation which supports this kind of
kernel buffer will grab one reference of the buffer until the operation is
completed.

ublk server implementing zero copy or user copy has to be CAP_SYS_ADMIN and
be trusted, because it is ublk server's responsibility to make sure IO buffer
Expand Down Expand Up @@ -485,6 +485,125 @@ Limitations
in case that too many ublk devices are handled by this single io_ring_ctx
and each one has very large queue depth

Shared Memory Zero Copy (UBLK_F_SHMEM_ZC)
------------------------------------------

The ``UBLK_F_SHMEM_ZC`` feature provides an alternative zero-copy path
that works by sharing physical memory pages between the client application
and the ublk server. Unlike the io_uring fixed buffer approach above,
shared memory zero copy does not require io_uring buffer registration
per I/O — instead, it relies on the kernel matching physical pages
at I/O time. This allows the ublk server to access the shared
buffer directly, which is unlikely for the io_uring fixed buffer
approach.

Motivation
~~~~~~~~~~

Shared memory zero copy takes a different approach: if the client
application and the ublk server both map the same physical memory, there is
nothing to copy. The kernel detects the shared pages automatically and
tells the server where the data already lives.

``UBLK_F_SHMEM_ZC`` can be thought of as a supplement for optimized client
applications — when the client is willing to allocate I/O buffers from
shared memory, the entire data path becomes zero-copy.

Use Cases
~~~~~~~~~

This feature is useful when the client application can be configured to
use a specific shared memory region for its I/O buffers:

- **Custom storage clients** that allocate I/O buffers from shared memory
(memfd, hugetlbfs) and issue direct I/O to the ublk device
- **Database engines** that use pre-allocated buffer pools with O_DIRECT

How It Works
~~~~~~~~~~~~

1. The ublk server and client both ``mmap()`` the same file (memfd or
hugetlbfs) with ``MAP_SHARED``. This gives both processes access to the
same physical pages.

2. The ublk server registers its mapping with the kernel::

struct ublk_shmem_buf_reg buf = { .addr = mmap_va, .len = size };
ublk_ctrl_cmd(UBLK_U_CMD_REG_BUF, .addr = &buf);

The kernel pins the pages and builds a PFN lookup tree.

3. When the client issues direct I/O (``O_DIRECT``) to ``/dev/ublkb*``,
the kernel checks whether the I/O buffer pages match any registered
pages by comparing PFNs.

4. On a match, the kernel sets ``UBLK_IO_F_SHMEM_ZC`` in the I/O
descriptor and encodes the buffer index and offset in ``addr``::

if (iod->op_flags & UBLK_IO_F_SHMEM_ZC) {
/* Data is already in our shared mapping — zero copy */
index = ublk_shmem_zc_index(iod->addr);
offset = ublk_shmem_zc_offset(iod->addr);
buf = shmem_table[index].mmap_base + offset;
}

5. If pages do not match (e.g., the client used a non-shared buffer),
the I/O falls back to the normal copy path silently.

The shared memory can be set up via two methods:

- **Socket-based**: the client sends a memfd to the ublk server via
``SCM_RIGHTS`` on a unix socket. The server mmaps and registers it.
- **Hugetlbfs-based**: both processes ``mmap(MAP_SHARED)`` the same
hugetlbfs file. No IPC needed — same file gives same physical pages.

Advantages
~~~~~~~~~~

- **Simple**: no per-I/O buffer registration or unregistration commands.
Once the shared buffer is registered, all matching I/O is zero-copy
automatically.
- **Direct buffer access**: the ublk server can read and write the shared
buffer directly via its own mmap, without going through io_uring fixed
buffer operations. This is more friendly for server implementations.
- **Fast**: PFN matching is a single maple tree lookup per bvec. No
io_uring command round-trips for buffer management.
- **Compatible**: non-matching I/O silently falls back to the copy path.
The device works normally for any client, with zero-copy as an
optimization when shared memory is available.

Limitations
~~~~~~~~~~~

- **Requires client cooperation**: the client must allocate its I/O
buffers from the shared memory region. This requires a custom or
configured client — standard applications using their own buffers
will not benefit.
- **Direct I/O only**: buffered I/O (without ``O_DIRECT``) goes through
the page cache, which allocates its own pages. These kernel-allocated
pages will never match the registered shared buffer. Only ``O_DIRECT``
puts the client's buffer pages directly into the block I/O.
- **Contiguous data only**: each I/O request's data must be contiguous
within a single registered buffer. Scatter/gather I/O that spans
multiple non-adjacent registered buffers cannot use the zero-copy path.

Control Commands
~~~~~~~~~~~~~~~~

- ``UBLK_U_CMD_REG_BUF``

Register a shared memory buffer. ``ctrl_cmd.addr`` points to a
``struct ublk_shmem_buf_reg`` containing the buffer virtual address and size.
Returns the assigned buffer index (>= 0) on success. The kernel pins
pages and builds the PFN lookup tree. Queue freeze is handled
internally.

- ``UBLK_U_CMD_UNREG_BUF``

Unregister a previously registered buffer. ``ctrl_cmd.data[0]`` is the
buffer index. Unpins pages and removes PFN entries from the lookup
tree.

References
==========

Expand Down
2 changes: 1 addition & 1 deletion MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -27013,7 +27013,7 @@ F: Documentation/filesystems/ubifs.rst
F: fs/ubifs/

UBLK USERSPACE BLOCK DRIVER
M: Ming Lei <ming.lei@redhat.com>
M: Ming Lei <tom.leiming@gmail.com>
L: [email protected]
S: Maintained
F: Documentation/block/ublk.rst
Expand Down
2 changes: 1 addition & 1 deletion block/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ bfq-y := bfq-iosched.o bfq-wf2q.o bfq-cgroup.o
obj-$(CONFIG_IOSCHED_BFQ) += bfq.o

obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o \
bio-integrity-auto.o
bio-integrity-auto.o bio-integrity-fs.o
obj-$(CONFIG_BLK_DEV_ZONED) += blk-zoned.o
obj-$(CONFIG_BLK_WBT) += blk-wbt.o
obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o
Expand Down
80 changes: 11 additions & 69 deletions block/bio-integrity-auto.c
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ static void bio_integrity_verify_fn(struct work_struct *work)
container_of(work, struct bio_integrity_data, work);
struct bio *bio = bid->bio;

blk_integrity_verify_iter(bio, &bid->saved_bio_iter);
bio->bi_status = bio_integrity_verify(bio, &bid->saved_bio_iter);
bio_integrity_finish(bid);
bio_endio(bio);
}
Expand All @@ -50,11 +50,6 @@ static bool bip_should_check(struct bio_integrity_payload *bip)
return bip->bip_flags & BIP_CHECK_FLAGS;
}

static bool bi_offload_capable(struct blk_integrity *bi)
{
return bi->metadata_size == bi->pi_tuple_size;
}

/**
* __bio_integrity_endio - Integrity I/O completion function
* @bio: Protected bio
Expand Down Expand Up @@ -84,83 +79,30 @@ bool __bio_integrity_endio(struct bio *bio)
/**
* bio_integrity_prep - Prepare bio for integrity I/O
* @bio: bio to prepare
* @action: preparation action needed (BI_ACT_*)
*
* Checks if the bio already has an integrity payload attached. If it does, the
* payload has been generated by another kernel subsystem, and we just pass it
* through.
* Otherwise allocates integrity payload and for writes the integrity metadata
* will be generated. For reads, the completion handler will verify the
* metadata.
* Allocate the integrity payload. For writes, generate the integrity metadata
* and for reads, setup the completion handler to verify the metadata.
*
* This is used for bios that do not have user integrity payloads attached.
*/
bool bio_integrity_prep(struct bio *bio)
void bio_integrity_prep(struct bio *bio, unsigned int action)
{
struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
struct bio_integrity_data *bid;
bool set_flags = true;
gfp_t gfp = GFP_NOIO;

if (!bi)
return true;

if (!bio_sectors(bio))
return true;

/* Already protected? */
if (bio_integrity(bio))
return true;

switch (bio_op(bio)) {
case REQ_OP_READ:
if (bi->flags & BLK_INTEGRITY_NOVERIFY) {
if (bi_offload_capable(bi))
return true;
set_flags = false;
}
break;
case REQ_OP_WRITE:
/*
* Zero the memory allocated to not leak uninitialized kernel
* memory to disk for non-integrity metadata where nothing else
* initializes the memory.
*/
if (bi->flags & BLK_INTEGRITY_NOGENERATE) {
if (bi_offload_capable(bi))
return true;
set_flags = false;
gfp |= __GFP_ZERO;
} else if (bi->metadata_size > bi->pi_tuple_size)
gfp |= __GFP_ZERO;
break;
default:
return true;
}

if (WARN_ON_ONCE(bio_has_crypt_ctx(bio)))
return true;

bid = mempool_alloc(&bid_pool, GFP_NOIO);
bio_integrity_init(bio, &bid->bip, &bid->bvec, 1);
bid->bio = bio;
bid->bip.bip_flags |= BIP_BLOCK_INTEGRITY;
bio_integrity_alloc_buf(bio, gfp & __GFP_ZERO);

bip_set_seed(&bid->bip, bio->bi_iter.bi_sector);

if (set_flags) {
if (bi->csum_type == BLK_INTEGRITY_CSUM_IP)
bid->bip.bip_flags |= BIP_IP_CHECKSUM;
if (bi->csum_type)
bid->bip.bip_flags |= BIP_CHECK_GUARD;
if (bi->flags & BLK_INTEGRITY_REF_TAG)
bid->bip.bip_flags |= BIP_CHECK_REFTAG;
}
bio_integrity_alloc_buf(bio, action & BI_ACT_ZERO);
if (action & BI_ACT_CHECK)
bio_integrity_setup_default(bio);

/* Auto-generate integrity metadata if this is a write */
if (bio_data_dir(bio) == WRITE && bip_should_check(&bid->bip))
blk_integrity_generate(bio);
bio_integrity_generate(bio);
else
bid->saved_bio_iter = bio->bi_iter;
return true;
}
EXPORT_SYMBOL(bio_integrity_prep);

Expand Down
Loading
Loading