Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
0cd109a
Add more cuda function to load
Feb 28, 2026
bbe25ab
Add _NBL_COMPILE_WITH_CUDA_ compile definition on CMakeLists.txt
Feb 28, 2026
d74349e
Move CCudaHandler constructor to cpp and query device info and attrib…
Feb 28, 2026
38ed6db
Add missing CFileView.h header in CCudaHandler.cpp
Feb 28, 2026
95338cd
Fix indentation of CCudaHandler.cpp
Feb 28, 2026
3e9dfd2
Add NBL_API2 to CCudaHandler
Feb 28, 2026
1ae7747
Fix fetching deviceUUID logic
Feb 28, 2026
a3150dc
Fix usage of CFileView
Feb 28, 2026
5018be7
Fix use after move of ptx cpuBuffer
Feb 28, 2026
5251b4d
Improve cpuBuffer initialization using params instead of aggregrate i…
Feb 28, 2026
d655b19
Fix indentation of CCudaHandler.cpp into tabs
Feb 28, 2026
454710b
Iterate m_availableDevices when creatingDevice
Feb 28, 2026
4645bc4
Implement context creation in CCUDADevice
Feb 28, 2026
3172ae7
Implement physical device getExternalMemoryProperties
Mar 12, 2026
f9b8b4f
Dedicated buffer and image
Mar 12, 2026
a2357e2
External Memory Feature flags should not be enum class
Mar 13, 2026
0d9c3d8
External Vulkan Buffer Creation
Mar 13, 2026
89f5ae5
Temporary enable compile with cuda flag
Mar 13, 2026
152830f
Update examples_tests submodule to vk_cuda interop demo branch
Mar 14, 2026
ea3b49b
External memory allocation
Mar 16, 2026
77b92ab
Fix indentation on CAssetConverter.cpp
Mar 16, 2026
68f740f
Update jitify submodule
Mar 16, 2026
1c93a91
External memory allocation cleanup
Mar 16, 2026
ae0e177
Implement proper CCUDADevice destructor.
Mar 16, 2026
c83942a
Implementation of Shared memory between vulkan and cuda
Mar 17, 2026
2e45702
Add NBL_API2 modifier to CCUDADevice
Mar 17, 2026
741252f
Implementation of Shared semaphore between Vulkan and CUDA
Mar 17, 2026
fe75ce0
Update to CUDA Toolkit version 13.0+
Mar 24, 2026
78fc0df
Fix external semaphore
Mar 24, 2026
5d19c5b
External image implementation
Mar 24, 2026
f23b30c
Remove unnecessary inline modifier
Mar 24, 2026
e50c85e
Remove unused code in CCUDADevice
Mar 25, 2026
a9c2d85
Fix importSemaphore for unix
Mar 25, 2026
e7ff325
Remove searching for old nvrtc version
Mar 25, 2026
c244b77
Fix filling dstQueueFamilyIndex
Mar 25, 2026
d24acf9
Update cuda toolkit requirement in cmake
Mar 25, 2026
ff82800
Improve external semaphore handle management
kevyuu Apr 15, 2026
7b48605
Improve win32HandleMetadata parameter so it is more readable
kevyuu Apr 15, 2026
24ba36e
Refactor CCUDASharedMemory to use ExternalHandleType
kevyuu Apr 16, 2026
5b4fc27
Refactor ExternalHandleType
kevyuu Apr 17, 2026
fb66f3a
Small fix to use CloseExternalHandle
kevyuu Apr 17, 2026
47ba7e4
Remove CCUDASharedMemory::exportAsImage
kevyuu Apr 22, 2026
d15d00c
Remove unused CCUDASharedMemory::exportAsBuffer
kevyuu Apr 22, 2026
ea36189
Refactor external memory allocation to store the external handle sepa…
kevyuu Apr 22, 2026
f04dcdb
Remove unused constructor parameter in CCUDASharedSemaphore
kevyuu Apr 22, 2026
cea9d9e
Implement CCUDAImportedMemory
kevyuu Apr 22, 2026
3ea3e9d
Rename CCUDASharedSemaphore into CCUDAImportedSemaphore
kevyuu Apr 22, 2026
130cd1e
Rename CCUDASharedMemory into CCUDAExportableMemory
kevyuu Apr 22, 2026
c624053
Remove unused member in CCUDAExportableMemory
kevyuu Apr 22, 2026
9127faa
Slight rename to CCUDADevice method
kevyuu Apr 22, 2026
059d1d5
Merge with master
kevyuu Apr 22, 2026
ff5a9cd
Merge branch 'master' into vk_cuda_interop
kevyuu Apr 22, 2026
2eb8fee
Add option for _NBL_COMPILE_WITH_CUDA_
kevyuu Apr 22, 2026
6605beb
Revert to correct state before merging with master
kevyuu Apr 23, 2026
af35f4f
Revert "Add option for _NBL_COMPILE_WITH_CUDA_"
kevyuu Apr 23, 2026
2479fb2
Slight fix
kevyuu Apr 23, 2026
f297cc2
Slight fix on linux handle
kevyuu Apr 23, 2026
3df125b
Fix typo
kevyuu Apr 23, 2026
2e2ca3f
Fix CCUDAImportedSemaphore constructor
kevyuu Apr 23, 2026
8c4c91e
Remove unused CCUDASharedSemaphore.cpp
kevyuu Apr 23, 2026
fcec268
Fix handle type for Linux
kevyuu Apr 23, 2026
ac18781
Add missing external handle type and make the constant consistent
kevyuu Apr 23, 2026
d73c851
Slight fix
kevyuu Apr 23, 2026
2c75ed8
Fix indentation and refactor to be more idiomatic
kevyuu Apr 23, 2026
3e905e9
Add some comment
kevyuu Apr 23, 2026
963a3d6
Fix typo
kevyuu Apr 23, 2026
0de37b0
Slight improvement
kevyuu Apr 23, 2026
d50d709
Remove unused variable
kevyuu Apr 23, 2026
763d173
Add include WIN32 include guard
kevyuu Apr 23, 2026
d71e52d
Remove unused class
kevyuu Apr 23, 2026
cfad816
Refactor CCUDADevice api to be more consistent with vulkan device api
kevyuu Apr 23, 2026
b22168e
Refactor constructor parameter naming
kevyuu Apr 23, 2026
5bd64ae
Idiomatic way to create core::smart_refctd_ptr
kevyuu Apr 23, 2026
bd0f8a2
Fix destruction and remove unnecessary SCUDACleaner
kevyuu Apr 23, 2026
6d47b90
CCUDAHandler construction more idiomatic
kevyuu Apr 24, 2026
0257d9a
Refactor magic number
kevyuu Apr 24, 2026
0999994
Remove releasing allocationHandle in destructor, since we already cal…
kevyuu Apr 24, 2026
6f4b889
Input validation and error logging
kevyuu Apr 24, 2026
129ceac
Revert 6605bebf changes in tgmath impl.hlsl
kevyuu Apr 30, 2026
2c08464
Fix indentation in IDeviceMemoryAllocator.h
kevyuu Apr 30, 2026
e993757
Turn off NBL_COMPILE_WITH_CUDA by default
kevyuu Apr 30, 2026
dcf0552
Move CCUDAHandler constructor from protected to public
kevyuu Apr 30, 2026
f6bf989
Fix crash due to dangling win32metadata
kevyuu Apr 30, 2026
0d237c0
Implement vk flag for HOST_NUMA and HOST_NUMA_CURRENT
kevyuu May 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 3rdparty/jitify
Submodule jitify updated 5 files
+10 −5 Makefile
+137 −65 jitify.hpp
+72 −0 jitify_test.cu
+586 −0 nvrtc_cli.cpp
+58 −0 nvrtc_cli_test.sh
6 changes: 3 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,10 @@ option(NBL_COMPILE_WITH_CUDA "Compile with CUDA interop?" OFF)

if(NBL_COMPILE_WITH_CUDA)
find_package(CUDAToolkit REQUIRED)
if(${CUDAToolkit_VERSION} VERSION_GREATER "9.0")
message(STATUS "CUDA version 9.0+ found!")
if(${CUDAToolkit_VERSION} VERSION_GREATER_EQUAL "13.0")
message(STATUS "CUDA version ${CUDAToolkit_VERSION} found!")
else()
message(FATAL_ERROR "CUDA version 9.0+ needed for C++14 support!")
message(FATAL_ERROR "CUDA version 13.0+ needed for C++14 support!")
endif()
endif()

Expand Down
2 changes: 1 addition & 1 deletion examples_tests
Submodule examples_tests updated 161 files
2 changes: 2 additions & 0 deletions include/nbl/asset/IBuffer.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ class IBuffer : public IDescriptor, public core::IBuffer
//! synthetic Nabla inventions
// whether `IGPUCommandBuffer::updateBuffer` can be used on this buffer
EUF_INLINE_UPDATE_VIA_CMDBUF = 0x80000000u,

EUF_SYNTHETIC_FLAGS_MASK = EUF_INLINE_UPDATE_VIA_CMDBUF | 0 /* fill out as needed if anymore synthethic flags are added*/
};

//!
Expand Down
155 changes: 39 additions & 116 deletions include/nbl/video/CCUDADevice.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@


#include "nbl/video/IPhysicalDevice.h"
#include "nbl/video/CCUDAExportableMemory.h"
#include "nbl/video/CCUDAImportedMemory.h"
#include "nbl/video/CCUDAImportedSemaphore.h"


#ifdef _NBL_COMPILE_WITH_CUDA_
Expand All @@ -24,9 +27,17 @@ namespace nbl::video
{
class CCUDAHandler;

class CCUDADevice : public core::IReferenceCounted
class NBL_API2 CCUDADevice : public core::IReferenceCounted
{
public:
public:
#ifdef _WIN32
static constexpr IDeviceMemoryAllocation::E_EXTERNAL_HANDLE_TYPE EXTERNAL_MEMORY_HANDLE_TYPE = IDeviceMemoryAllocation::EHT_OPAQUE_WIN32;
static constexpr CUmemAllocationHandleType ALLOCATION_HANDLE_TYPE = CU_MEM_HANDLE_TYPE_WIN32;
#else
static constexpr IDeviceMemoryAllocation::E_EXTERNAL_HANDLE_TYPE EXTERNAL_MEMORY_HANDLE_TYPE = IDeviceMemoryAllocation::EHT_OPAQUE_FD;
static constexpr CUmemAllocationHandleType ALLOCATION_HANDLE_TYPE = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR;
#endif

enum E_VIRTUAL_ARCHITECTURE
{
EVA_30,
Expand Down Expand Up @@ -63,132 +74,44 @@ class CCUDADevice : public core::IReferenceCounted
};
inline E_VIRTUAL_ARCHITECTURE getVirtualArchitecture() {return m_virtualArchitecture;}

CCUDADevice(core::smart_refctd_ptr<CVulkanConnection>&& vulkanConnection, IPhysicalDevice* const vulkanDevice, const E_VIRTUAL_ARCHITECTURE virtualArchitecture, CUdevice device, core::smart_refctd_ptr<CCUDAHandler>&& handler);

~CCUDADevice();

inline core::SRange<const char* const> geDefaultCompileOptions() const
{
return {m_defaultCompileOptions.data(),m_defaultCompileOptions.data()+m_defaultCompileOptions.size()};
}

// TODO/REDO Vulkan: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXTRES__INTEROP.html
// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#vulkan-interoperability
// Watch out, use Driver API (`cu` functions) NOT the Runtime API (`cuda` functions)
// Also maybe separate this out into its own `CCUDA` class instead of nesting it here?
#if 0
template<typename ObjType>
struct GraphicsAPIObjLink
{
GraphicsAPIObjLink() : obj(nullptr), cudaHandle(nullptr), acquired(false)
{
asImage = {nullptr};
}
GraphicsAPIObjLink(core::smart_refctd_ptr<ObjType>&& _obj) : GraphicsAPIObjLink()
{
obj = std::move(_obj);
}
GraphicsAPIObjLink(GraphicsAPIObjLink&& other) : GraphicsAPIObjLink()
{
operator=(std::move(other));
}

GraphicsAPIObjLink(const GraphicsAPIObjLink& other) = delete;
GraphicsAPIObjLink& operator=(const GraphicsAPIObjLink& other) = delete;
GraphicsAPIObjLink& operator=(GraphicsAPIObjLink&& other)
{
std::swap(obj,other.obj);
std::swap(cudaHandle,other.cudaHandle);
std::swap(acquired,other.acquired);
std::swap(asImage,other.asImage);
return *this;
}

~GraphicsAPIObjLink()
{
assert(!acquired); // you've fucked up, there's no way for us to fix it, you need to release the objects on a proper stream
if (obj)
CCUDAHandler::cuda.pcuGraphicsUnregisterResource(cudaHandle);
}

//
auto* getObject() const {return obj.get();}

private:
core::smart_refctd_ptr<ObjType> obj;
CUgraphicsResource cudaHandle;
bool acquired;

friend class CCUDAHandler;
public:
union
{
struct
{
CUdeviceptr pointer;
} asBuffer;
struct
{
CUmipmappedArray mipmappedArray;
CUarray array;
} asImage;
};
};
CUdevice getInternalObject() const { return m_handle; }

//
static CUresult registerBuffer(GraphicsAPIObjLink<video::IGPUBuffer>* link, uint32_t flags = CU_GRAPHICS_REGISTER_FLAGS_NONE);
static CUresult registerImage(GraphicsAPIObjLink<video::IGPUImage>* link, uint32_t flags = CU_GRAPHICS_REGISTER_FLAGS_NONE);

const CCUDAHandler* getHandler() const { return m_handler.get(); }

template<typename ObjType>
static CUresult acquireResourcesFromGraphics(void* tmpStorage, GraphicsAPIObjLink<ObjType>* linksBegin, GraphicsAPIObjLink<ObjType>* linksEnd, CUstream stream)
{
auto count = std::distance(linksBegin,linksEnd);

auto resources = reinterpret_cast<CUgraphicsResource*>(tmpStorage);
auto rit = resources;
for (auto iit=linksBegin; iit!=linksEnd; iit++,rit++)
{
if (iit->acquired)
return CUDA_ERROR_UNKNOWN;
*rit = iit->cudaHandle;
}

auto retval = cuda.pcuGraphicsMapResources(count,resources,stream);
for (auto iit=linksBegin; iit!=linksEnd; iit++)
iit->acquired = true;
return retval;
}
template<typename ObjType>
static CUresult releaseResourcesToGraphics(void* tmpStorage, GraphicsAPIObjLink<ObjType>* linksBegin, GraphicsAPIObjLink<ObjType>* linksEnd, CUstream stream)
{
auto count = std::distance(linksBegin,linksEnd);

auto resources = reinterpret_cast<CUgraphicsResource*>(tmpStorage);
auto rit = resources;
for (auto iit=linksBegin; iit!=linksEnd; iit++,rit++)
{
if (!iit->acquired)
return CUDA_ERROR_UNKNOWN;
*rit = iit->cudaHandle;
}

auto retval = cuda.pcuGraphicsUnmapResources(count,resources,stream);
for (auto iit=linksBegin; iit!=linksEnd; iit++)
iit->acquired = false;
return retval;
}
bool isMatchingDevice(const IPhysicalDevice* device) { return device && !memcmp(device->getProperties().deviceUUID, m_physicalDevice->getProperties().deviceUUID, 16); }

static CUresult acquireAndGetPointers(GraphicsAPIObjLink<video::IGPUBuffer>* linksBegin, GraphicsAPIObjLink<video::IGPUBuffer>* linksEnd, CUstream stream, size_t* outbufferSizes = nullptr);
static CUresult acquireAndGetMipmappedArray(GraphicsAPIObjLink<video::IGPUImage>* linksBegin, GraphicsAPIObjLink<video::IGPUImage>* linksEnd, CUstream stream);
static CUresult acquireAndGetArray(GraphicsAPIObjLink<video::IGPUImage>* linksBegin, GraphicsAPIObjLink<video::IGPUImage>* linksEnd, uint32_t* arrayIndices, uint32_t* mipLevels, CUstream stream);
#endif
size_t roundToGranularity(CUmemLocationType location, size_t size) const;

core::smart_refctd_ptr<CCUDAExportableMemory> createExportableMemory(CCUDAExportableMemory::SCreationParams&& inParams);

core::smart_refctd_ptr<CCUDAImportedMemory> importExternalMemory(core::smart_refctd_ptr<IDeviceMemoryAllocation>&& mem);

protected:
friend class CCUDAHandler;
CCUDADevice(core::smart_refctd_ptr<CVulkanConnection>&& _vulkanConnection, IPhysicalDevice* const _vulkanDevice, const E_VIRTUAL_ARCHITECTURE _virtualArchitecture);
~CCUDADevice() = default;

core::smart_refctd_ptr<CCUDAImportedSemaphore> importExternalSemaphore(core::smart_refctd_ptr<ISemaphore>&& sem);

private:
CUresult reserveAddressAndMapMemory(CUdeviceptr* outPtr, size_t size, size_t alignment, CUmemLocationType location, CUmemGenericAllocationHandle memory) const;

static constexpr auto CudaMemoryLocationCount = 5;

const system::logger_opt_ptr m_logger;
std::vector<const char*> m_defaultCompileOptions;
core::smart_refctd_ptr<CVulkanConnection> m_vulkanConnection;
IPhysicalDevice* const m_vulkanDevice;
IPhysicalDevice* const m_physicalDevice;
E_VIRTUAL_ARCHITECTURE m_virtualArchitecture;

core::smart_refctd_ptr<CCUDAHandler> m_handler;
CUdevice m_handle;
CUcontext m_context;
std::array<size_t, CudaMemoryLocationCount> m_allocationGranularity;
};

}
Expand Down
65 changes: 65 additions & 0 deletions include/nbl/video/CCUDAExportableMemory.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
// Copyright (C) 2018-2020 - DevSH Graphics Programming Sp. z O.O.
// This file is part of the "Nabla Engine".
// For conditions of distribution and use, see copyright notice in nabla.h
#ifndef _NBL_VIDEO_C_CUDA_EXPORTABLE_MEMORY_H_
#define _NBL_VIDEO_C_CUDA_EXPORTABLE_MEMORY_H_


#ifdef _NBL_COMPILE_WITH_CUDA_

#include "cuda.h"
#include "nvrtc.h"
#if CUDA_VERSION < 9000
#error "Need CUDA 9.0 SDK or higher."
#endif

// useful includes in the future
//#include "cudaEGL.h"
//#include "cudaVDPAU.h"

namespace nbl::video
{

class CCUDADevice;

class NBL_API2 CCUDAExportableMemory : public core::IReferenceCounted
{
public:

struct SCreationParams
{
size_t size;
uint32_t alignment;
CUmemLocationType location;
};

struct SCachedCreationParams : SCreationParams
{
size_t granularSize;
CUdeviceptr ptr;
external_handle_t externalHandle;
};

CCUDAExportableMemory(core::smart_refctd_ptr<CCUDADevice> device, SCachedCreationParams&& params)
: m_device(std::move(device))
, m_params(std::move(params))
{}
~CCUDAExportableMemory() override;

CUdeviceptr getDeviceptr() const { return m_params.ptr; }

const SCreationParams& getCreationParams() const { return m_params; }

core::smart_refctd_ptr<IDeviceMemoryAllocation> exportAsMemory(ILogicalDevice* device, IDeviceMemoryBacked* dedication = nullptr) const;

private:

core::smart_refctd_ptr<CCUDADevice> m_device;
SCachedCreationParams m_params;
};

}

#endif // _NBL_COMPILE_WITH_CUDA_

#endif
Loading
Loading