vfio/pci: Allow MMIO regions to be exported through dma-buf#64
vfio/pci: Allow MMIO regions to be exported through dma-buf#64blktests-ci[bot] wants to merge 10 commits intolinus-master_basefrom
Conversation
|
Upstream branch: 14bed9b |
6637119 to
f092a9b
Compare
|
Upstream branch: 260f6f4 |
c43e714 to
da63a35
Compare
f092a9b to
0b59764
Compare
|
Upstream branch: d6084bb |
da63a35 to
981c2f7
Compare
0b59764 to
aee5bd3
Compare
|
Upstream branch: 831462f |
981c2f7 to
b92bea9
Compare
aee5bd3 to
ef18525
Compare
|
Upstream branch: c93529a |
b92bea9 to
21c3865
Compare
ef18525 to
3851b3f
Compare
|
Upstream branch: cbbf0a7 |
21c3865 to
779044c
Compare
3851b3f to
28b3384
Compare
|
Upstream branch: 6a68cec |
779044c to
57b36e8
Compare
28b3384 to
8ab9be5
Compare
|
Upstream branch: f2d282e |
57b36e8 to
65985b8
Compare
8ab9be5 to
5b90760
Compare
|
Upstream branch: 89748ac |
65985b8 to
f496b49
Compare
3893da1 to
aeddbbb
Compare
|
Upstream branch: b96ddbc |
9694991 to
2710371
Compare
77110f5 to
a2e0474
Compare
|
Upstream branch: 2b38afc |
2710371 to
97a71d1
Compare
a2e0474 to
36a8aec
Compare
|
Upstream branch: 8f5ae30 |
97a71d1 to
74ecbfb
Compare
36a8aec to
1a46df6
Compare
|
Upstream branch: 53e760d |
|
Upstream branch: 0e39a73 |
|
Upstream branch: 8742b2d |
|
Upstream branch: 91325f3 |
|
Upstream branch: 3a4a036 |
|
Upstream branch: dfc0f63 |
|
Upstream branch: 0cc5352 |
|
Upstream branch: 24ea63e |
|
Upstream branch: d7ee5bd |
Remove the bus_off field from pci_p2pdma_map_state since it duplicates information already available in the pgmap structure. The bus_offset is only used in one location (pci_p2pdma_bus_addr_map) and is always identical to pgmap->bus_offset. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
Currently the P2PDMA code requires a pgmap and a struct page to function. The was serving three important purposes: - DMA API compatibility, where scatterlist required a struct page as input - Life cycle management, the percpu_ref is used to prevent UAF during device hot unplug - A way to get the P2P provider data through the pci_p2pdma_pagemap The DMA API now has a new flow, and has gained phys_addr_t support, so it no longer needs struct pages to perform P2P mapping. Lifecycle management can be delegated to the user, DMABUF for instance has a suitable invalidation protocol that does not require struct page. Finding the P2P provider data can also be managed by the caller without need to look it up from the phys_addr. Split the P2PDMA code into two layers. The optional upper layer, effectively, provides a way to mmap() P2P memory into a VMA by providing struct page, pgmap, a genalloc and sysfs. The lower layer provides the actual P2P infrastructure and is wrapped up in a new struct p2pdma_provider. Rework the mmap layer to use new p2pdma_provider based APIs. Drivers that do not want to put P2P memory into VMA's can allocate a struct p2pdma_provider after probe() starts and free it before remove() completes. When DMA mapping the driver must convey the struct p2pdma_provider to the DMA mapping code along with a phys_addr of the MMIO BAR slice to map. The driver must ensure that no DMA mapping outlives the lifetime of the struct p2pdma_provider. The intended target of this new API layer is DMABUF. There is usually only a single p2pdma_provider for a DMABUF exporter. Most drivers can establish the p2pdma_provider during probe, access the single instance during DMABUF attach and use that to drive the DMA mapping. DMABUF provides an invalidation mechanism that can guarantee all DMA is halted and the DMA mappings are undone prior to destroying the struct p2pdma_provider. This ensures there is no UAF through DMABUFs that are lingering past driver removal. The new p2pdma_provider layer cannot be used to create P2P memory that can be mapped into VMA's, be used with pin_user_pages(), O_DIRECT, and so on. These use cases must still use the mmap() layer. The p2pdma_provider layer is principally for DMABUF-like use cases where DMABUF natively manages the life cycle and access instead of vmas/pin_user_pages()/struct page. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
Update the pci_p2pdma_bus_addr_map() function to take a direct pointer to the p2pdma_provider structure instead of the pci_p2pdma_map_state. This simplifies the API by removing the need for callers to extract the provider from the state structure. The change updates all callers across the kernel (block layer, IOMMU, DMA direct, and HMM) to pass the provider pointer directly, making the code more explicit and reducing unnecessary indirection. This also removes the runtime warning check since callers now have direct control over which provider they use. Signed-off-by: Leon Romanovsky <[email protected]>
…llocation Refactor the PCI P2PDMA subsystem to separate the core peer-to-peer DMA functionality from the optional memory allocation layer. This creates a two-tier architecture: The core layer provides P2P mapping functionality for physical addresses based on PCI device MMIO BARs and integrates with the DMA API for mapping operations. This layer is required for all P2PDMA users. The optional upper layer provides memory allocation capabilities including gen_pool allocator, struct page support, and sysfs interface for user space access. This separation allows subsystems like VFIO to use only the core P2P mapping functionality without the overhead of memory allocation features they don't need. The core functionality is now available through the new pci_p2pdma_enable() function that returns a p2pdma_provider structure. Signed-off-by: Leon Romanovsky <[email protected]>
Export the pci_p2pdma_map_type() function to allow external modules and subsystems to determine the appropriate mapping type for P2PDMA transfers between a provider and target device. The function determines whether peer-to-peer DMA transfers can be done directly through PCI switches (PCI_P2PDMA_MAP_BUS_ADDR) or must go through the host bridge (PCI_P2PDMA_MAP_THRU_HOST_BRIDGE), or if the transfer is not supported at all. This export enables subsystems like VFIO to properly handle P2PDMA operations by querying the mapping type before attempting transfers, ensuring correct DMA address programming and error handling. Signed-off-by: Leon Romanovsky <[email protected]>
Move the struct phys_vec definition from block/blk-mq-dma.c to include/linux/types.h to make it available for use across the kernel. The phys_vec structure represents a physical address range with a length, which is used by the new physical address-based DMA mapping API. This structure is already used by the block layer and will be needed by upcoming VFIO patches for dma-buf operations. Moving this definition to types.h provides a centralized location for this common data structure and eliminates code duplication across subsystems that need to work with physical address ranges. Signed-off-by: Leon Romanovsky <[email protected]>
These helpers are useful for managing additional references taken on the device from other associated VFIO modules. Original-patch-by: Jason Gunthorpe <[email protected]> Signed-off-by: Vivek Kasireddy <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
|
Upstream branch: b19a97d |
Make sure that all VFIO PCI devices have peer-to-peer capabilities enables, so we would be able to export their MMIO memory through DMABUF, Signed-off-by: Leon Romanovsky <[email protected]>
There is no need to share the main device pointer (struct vfio_device *) with all the feature functions as they only need the core device pointer. Therefore, extract the core device pointer once in the caller (vfio_pci_core_ioctl_feature) and share it instead. Signed-off-by: Vivek Kasireddy <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
Add support for exporting PCI device MMIO regions through dma-buf, enabling safe sharing of non-struct page memory with controlled lifetime management. This allows RDMA and other subsystems to import dma-buf FDs and build them into memory regions for PCI P2P operations. The implementation provides a revocable attachment mechanism using dma-buf move operations. MMIO regions are normally pinned as BARs don't change physical addresses, but access is revoked when the VFIO device is closed or a PCI reset is issued. This ensures kernel self-defense against potentially hostile userspace. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Vivek Kasireddy <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
Pull request for series with
subject: vfio/pci: Allow MMIO regions to be exported through dma-buf
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=985118