vfio/pci: Allow MMIO regions to be exported through dma-buf#57
Closed
blktests-ci[bot] wants to merge 18 commits intofor-next_basefrom
Closed
vfio/pci: Allow MMIO regions to be exported through dma-buf#57blktests-ci[bot] wants to merge 18 commits intofor-next_basefrom
blktests-ci[bot] wants to merge 18 commits intofor-next_basefrom
Conversation
* block-6.16: block: fix module reference leak in mq-deadline I/O scheduler
* for-6.17/io_uring: (39 commits) io_uring: fix breakage in EXPERT menu io_uring/cmd: remove struct io_uring_cmd_data btrfs/ioctl: store btrfs_uring_encoded_data in io_btrfs_cmd io_uring/cmd: introduce IORING_URING_CMD_REISSUE flag io_uring/zcrx: account area memory io_uring: export io_[un]account_mem io_uring/net: Support multishot receive len cap io_uring: deduplicate wakeup handling io_uring/net: cast min_not_zero() type io_uring/poll: cleanup apoll freeing io_uring/net: allow multishot receive per-invocation cap io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags io_uring/net: use passed in 'len' in io_recv_buf_select() io_uring/zcrx: prepare fallback for larger pages io_uring/zcrx: assert area type in io_zcrx_iov_page io_uring/zcrx: allocate sgtable for umem areas io_uring/zcrx: introduce io_populate_area_dma io_uring/zcrx: return error from io_zcrx_map_area_* io_uring/zcrx: always pass page to io_zcrx_copy_chunk io_uring/rw: cast rw->flags assignment to rwf_t ...
* for-6.17/block: (77 commits) dm: split write BIOs on zone boundaries when zone append is not emulated block: use chunk_sectors when evaluating stacked atomic write limits dm-stripe: limit chunk_sectors to the stripe size md/raid10: set chunk_sectors limit md/raid0: set chunk_sectors limit block: sanitize chunk_sectors for atomic write limits ilog2: add max_pow_of_two_factor() block: fix blk_zone_append_update_request_bio() kernel-doc ublk: remove unused req argument from ublk_sub_req_ref() selftests: ublk: add utils.h selftests: ublk: add helper ublk_handle_uring_cmd() for handle ublk command selftests: ublk: improve flags naming selftests: ublk: remove ublk queue self-defined flags selftests: ublk: pass 'ublk_thread *' to more common helpers selftests: ublk: pass 'ublk_thread *' to ->queue_io() and ->tgt_io_done() selftests: ublk: remove `tag` parameter of ->tgt_io_done() ublk: pass 'const struct ublk_io *' to ublk_[un]map_io() ublk: remove ublk_commit_and_fetch() ublk: add helper ublk_check_fetch_buf() ublk: store auto buffer register data into `struct ublk_io` ...
* for-6.17/io_uring: io_uring/zcrx: fix leaking pages on sg init fail io_uring/zcrx: don't leak pages on account failure io_uring/zcrx: fix null ifq on area destruction
* for-6.17/block: nvme-pci: try function level reset on init failure nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails nvme-tcp: log TLS handshake failures at error level docs: nvme: fix grammar in nvme-pci-endpoint-target.rst nvme: fix typo in status code constant for self-test in progress nvmet: remove redundant assignment of error code in nvmet_ns_enable() nvme: fix incorrect variable in io cqes error message nvme: fix multiple spelling and grammar issues in host drivers md/raid10: fix set but not used variable in sync_request_write() md: allow removing faulty rdev during resync md/raid5: unset WQ_CPU_INTENSIVE for raid5 unbound workqueue md: remove/add redundancy group only in level change md: Don't clear MD_CLOSING until mddev is freed md: call del_gendisk in control path
* for-6.17/block: sunvdc: Balance device refcount in vdc_port_mpgroup_check
* for-6.17/block: cdrom: Call cdrom_mrw_exit from cdrom_release function
Author
|
Upstream branch: a8fa173 |
Remove the bus_off field from pci_p2pdma_map_state since it duplicates information already available in the pgmap structure. The bus_offset is only used in one location (pci_p2pdma_bus_addr_map) and is always identical to pgmap->bus_offset. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
Extract the core P2PDMA provider information (device owner and bus offset) from the dev_pagemap into a dedicated p2pdma_provider structure. This creates a cleaner separation between the memory management layer and the P2PDMA functionality. The new p2pdma_provider structure contains: - owner: pointer to the providing device - bus_offset: computed offset for non-host transactions This refactoring simplifies the P2PDMA state management by removing the need to access pgmap internals directly. The pci_p2pdma_map_state now stores a pointer to the provider instead of the pgmap, making the API more explicit and easier to understand. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
Update the pci_p2pdma_bus_addr_map() function to take a direct pointer to the p2pdma_provider structure instead of the pci_p2pdma_map_state. This simplifies the API by removing the need for callers to extract the provider from the state structure. The change updates all callers across the kernel (block layer, IOMMU, DMA direct, and HMM) to pass the provider pointer directly, making the code more explicit and reducing unnecessary indirection. This also removes the runtime warning check since callers now have direct control over which provider they use. Signed-off-by: Leon Romanovsky <[email protected]>
…llocation Refactor the PCI P2PDMA subsystem to separate the core peer-to-peer DMA functionality from the optional memory allocation layer. This creates a two-tier architecture: The core layer provides P2P mapping functionality for physical addresses based on PCI device MMIO BARs and integrates with the DMA API for mapping operations. This layer is required for all P2PDMA users. The optional upper layer provides memory allocation capabilities including gen_pool allocator, struct page support, and sysfs interface for user space access. This separation allows subsystems like VFIO to use only the core P2P mapping functionality without the overhead of memory allocation features they don't need. The core functionality is now available through the new pci_p2pdma_enable() function that returns a p2pdma_provider structure. Signed-off-by: Leon Romanovsky <[email protected]>
Export the pci_p2pdma_map_type() function to allow external modules and subsystems to determine the appropriate mapping type for P2PDMA transfers between a provider and target device. The function determines whether peer-to-peer DMA transfers can be done directly through PCI switches (PCI_P2PDMA_MAP_BUS_ADDR) or must go through the host bridge (PCI_P2PDMA_MAP_THRU_HOST_BRIDGE), or if the transfer is not supported at all. This export enables subsystems like VFIO to properly handle P2PDMA operations by querying the mapping type before attempting transfers, ensuring correct DMA address programming and error handling. Signed-off-by: Leon Romanovsky <[email protected]>
Move the struct phys_vec definition from block/blk-mq-dma.c to include/linux/types.h to make it available for use across the kernel. The phys_vec structure represents a physical address range with a length, which is used by the new physical address-based DMA mapping API. This structure is already used by the block layer and will be needed by upcoming VFIO patches for dma-buf operations. Moving this definition to types.h provides a centralized location for this common data structure and eliminates code duplication across subsystems that need to work with physical address ranges. Signed-off-by: Leon Romanovsky <[email protected]>
These helpers are useful for managing additional references taken on the device from other associated VFIO modules. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Vivek Kasireddy <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
Make sure that all VFIO PCI devices have peer-to-peer capabilities enables, so we would be able to export their MMIO memory through DMABUF, Signed-off-by: Leon Romanovsky <[email protected]>
There is no need to share the main device pointer (struct vfio_device *) with all the feature functions as they only need the core device pointer. Therefore, extract the core device pointer once in the caller (vfio_pci_core_ioctl_feature) and share it instead. Signed-off-by: Vivek Kasireddy <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
Add support for exporting PCI device MMIO regions through dma-buf, enabling safe sharing of non-struct page memory with controlled lifetime management. This allows RDMA and other subsystems to import dma-buf FDs and build them into memory regions for PCI P2P operations. The implementation provides a revocable attachment mechanism using dma-buf move operations. MMIO regions are normally pinned as BARs don't change physical addresses, but access is revoked when the VFIO device is closed or a PCI reset is issued. This ensures kernel self-defense against potentially hostile userspace. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Vivek Kasireddy <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
Author
|
Upstream branch: a8fa173 |
70add1a to
ce7b8f4
Compare
8bf6490 to
9581850
Compare
blktests-ci Bot
pushed a commit
that referenced
this pull request
Aug 4, 2025
All of the ID tables based on <linux/mod_devicetable.h> (of_device_id,
pci_device_id, ...) require their arrays to end in an empty sentinel
value. That's usually spelled with an empty initializer entry (e.g.,
"{}"), but also sometimes with explicit 0 entries, field initializers
(e.g., '.id = ""'), or even a macro entry (like PCMCIA_DEVICE_NULL).
Without a sentinel, device-matching code may read out of bounds.
I've found a number of such bugs in driver reviews, and we even
occasionally commit one to the tree. See commit 5751eee ("i2c:
nomadik: Add missing sentinel to match table") for example.
Teach checkpatch to find these ID tables, and complain if it looks like
there wasn't a sentinel value.
Test output:
$ git format-patch -1 a0d15cc --stdout | scripts/checkpatch.pl -
ERROR: missing sentinel in ID array
#57: FILE: drivers/i2c/busses/i2c-nomadik.c:1073:
+static const struct of_device_id nmk_i2c_eyeq_match_table[] = {
{
.compatible = "XXXXXXXXXXXXXXXXXX",
.data = (void *)(NMK_I2C_EYEQ_FLAG_32B_BUS | NMK_I2C_EYEQ_FLAG_IS_EYEQ5),
},
};
total: 1 errors, 0 warnings, 66 lines checked
NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.
"[PATCH] i2c: nomadik: switch from of_device_is_compatible() to" has style problems, please review.
NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
When run across the entire tree (scripts/checkpatch.pl -q --types
MISSING_SENTINEL -f ...), false positives exist:
* where macros are used that hide the table from analysis
(e.g., drivers/gpu/drm/radeon/radeon_drv.c / radeon_PCI_IDS).
There are fewer than 5 of these.
* where such tables are processed correctly via ARRAY_SIZE() (fewer than
5 instances). This is by far not the typical usage of *_device_id
arrays.
* some odd parsing artifacts, where ctx_statement_block() seems to quit
in the middle of a block due to #if/#else/#endif.
Also, not every "struct *_device_id" is in fact a sentinel-requiring
structure, but even with such types, false positives are very rare.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Brian Norris <[email protected]>
Acked-by: Joe Perches <[email protected]>
Cc: Andy Whitcroft <[email protected]>
Cc: Brian Norris <[email protected]>
Cc: Dwaipayan Ray <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull request for series with
subject: vfio/pci: Allow MMIO regions to be exported through dma-buf
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=985118