[ludus-renderer]: add Vulkan mesh-shader backend (LudusTimestampedContext) by wlewNV · Pull Request #135 · NVIDIA/flashdreams

wlewNV · 2026-05-21T04:36:56Z

(left: cuda, middle: vk, right: diff)

Summary

Adds LudusTimestampedContext — a Vulkan VK_EXT_mesh_shader rendering backend that mirrors the LudusCudaTimestampedContext API via CUDA–Vulkan external-memory interop (one vkCmdDrawMeshTasksEXT submission per primitive family). CUDA stays the default; the Vulkan path is opt-in and also selectable in interactive-drive via --ludus-backend vulkan.

What's in it

C++ Vulkan backend (vkutil, ludus_timestamped_vk): external-memory SSBOs imported into CUDA, MSAA, nvjpeg encode.
GL_EXT_mesh_shader shaders for the polyline/polygon/obstacle families; SPIR-V embedded in shaders_spv.h (rebuild with shaders/compile.sh).
Python LudusTimestampedContext (lazy import; ImportError only on construction).
interactive-drive: --ludus-backend {cuda,vulkan}.
examples/compare_vulkan_vs_cuda.py (CUDA-vs-Vulkan parity) + tests/test_vulkan_backend.py.

Testing

# Parity vs CUDA (writes side-by-side + pixel diff to ./_vk_compare/):
cd integrations/omnidreams/ludus-renderer
uv run python examples/compare_vulkan_vs_cuda.py --frame 12   # auto-discovers cached USDZ
uv run python examples/compare_vulkan_vs_cuda.py --synthetic  # no scene data needed
uv run pytest tests/test_vulkan_backend.py
# interactive-drive on Vulkan (log shows: [rasterizer] ludus_backend=vulkan):
cd integrations/omnidreams
uv run python -m omnidreams.interactive_drive --backend raster --ludus-backend vulkan
uv run python -m omnidreams.interactive_drive --backend raster --ludus-backend vulkan --stream-mjpeg 8080

Caveats

CUDA→Vulkan interop uses opaque-FD external memory, so the Vulkan path is Linux-only. The CUDA backend is unchanged and cross-platform.

copy-pr-bot · 2026-05-21T04:36:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-06-05T01:55:53Z

Greptile Summary

This PR introduces LudusTimestampedContext, a Vulkan VK_EXT_mesh_shader rendering backend that mirrors the LudusCudaTimestampedContext API using CUDA–Vulkan external-memory interop. CUDA stays the default; the Vulkan path is opt-in and selectable via --ludus-backend vulkan in interactive-drive.

Adds a complete C++ Vulkan backend (vkutil, ludus_timestamped_vk) with MSAA support, CUDA–Vulkan SSBO interop via opaque-FD external memory, and nvjpeg encode.
Introduces GL_EXT_mesh_shader GLSL shaders for polyline/polygon/obstacle families (SPIR-V embedded in shaders_spv.h), a Python LudusTimestampedContext with lazy JIT compilation, and a compare_vulkan_vs_cuda.py parity example with a smoke test suite.

Confidence Score: 5/5

Safe to merge. The rendering path is correct, all previously raised correctness issues have been resolved, and the remaining findings are non-blocking quality notes.

All critical correctness fixes from the prior review round are present: the HostToDevice tombstone copy with synchronization, the render-pass/pipeline rebuild on MSAA change, the descriptor-pool reset moved before vkCmdBeginRenderPass, and the multi-scene SSBO cursor tracking. The remaining findings are validation-layer noise and JIT-loader robustness issues that do not affect rendered output or data integrity.

vkutil.cpp (transitionImageLayout wrong source masks on the readback return transition) and _plugin_vk.py (TORCH_CUDA_ARCH_LIST side effect and vulkaninfo availability heuristic).

Important Files Changed

Filename	Overview
integrations/omnidreams/ludus-renderer/ludus_renderer/_cpp/render/ludus_timestamped_vk.cpp	Core Vulkan renderer: render pass/pipeline creation, descriptor set management, CUDA→Vulkan SSBO sync (host-roundtrip + queue family barriers), and batch render dispatch. Previously flagged issues are all addressed. Minor: unchecked vkAllocateDescriptorSets return after pool reset.
integrations/omnidreams/ludus-renderer/ludus_renderer/_cpp/common/vkutil.cpp	Vulkan context/buffer/image helpers with CUDA external-memory interop via opaque-FD. transitionImageLayout has hardcoded src access masks that are incorrect for the TRANSFER_SRC→GENERAL return transition used in every frame readback, causing validation layer noise and unnecessary stalls.
integrations/omnidreams/ludus-renderer/ludus_renderer/_ops/context_vk.py	Python LudusTimestampedContext: scene/pool packing, SSBO cursor tracking, query assembly, and render dispatch. Global cursor state correctly mirrors C++ *Used counters. Struct layout verified against C++ types.
integrations/omnidreams/ludus-renderer/ludus_renderer/_ops/_plugin_vk.py	JIT compilation loader for the Vulkan extension. Two issues: TORCH_CUDA_ARCH_LIST='' is set process-globally before compilation, and vulkaninfo PATH presence is used as a proxy for libvulkan.so availability.
integrations/omnidreams/ludus-renderer/ludus_renderer/_cpp/bindings/torch_rasterize_vk.cpp	PyTorch pybind11 bindings wrapping the Vulkan state. FLU→RDF basis conversion, MSAA sample change triggers width/height reset for framebuffer rebuild, and color palette int32→float32 conversion are all correctly implemented.
integrations/omnidreams/ludus-renderer/tests/test_vulkan_backend.py	Smoke tests for Vulkan backend: init, upload, single-frame render, and a CUDA/Vulkan lit-pixel parity check (within 25%). Tests gracefully skip on missing Vulkan ICD or mesh-shader support.
integrations/omnidreams/omnidreams/interactive_drive/rasterizer.py	Adds backend selection: instantiates LudusTimestampedContext or LudusCudaTimestampedContext based on raster.ludus_backend. Change is minimal and correct.

Sequence Diagram

sequenceDiagram
    participant Py as Python (LudusTimestampedContext)
    participant Cpp as C++ Binding (torch_rasterize_vk)
    participant VkCpp as ludus_timestamped_vk
    participant CUDA as CUDA Runtime
    participant Vk as Vulkan GPU

    Py->>Cpp: upload_cameras(intrinsics_cuda_tensor)
    Cpp->>VkCpp: ludusUploadCamerasVk()
    VkCpp->>CUDA: cudaMemcpyAsync(DeviceToDevice, cameraIntrinsicsBuffer.cuDevPtr)

    Py->>Cpp: upload_scene(packed_tensors)
    Cpp->>VkCpp: ludusUploadSceneVk()
    VkCpp->>VkCpp: ensureBuffers() / resizeExternalBuffer()
    VkCpp->>CUDA: cudaMemcpyAsync(timestamps, int32, vertices DeviceToDevice)
    VkCpp->>VkCpp: updateDescriptorSet()

    Py->>Cpp: render_batch(queries, poses, resolution)
    Cpp->>VkCpp: ludusRenderBatchVk()
    VkCpp->>CUDA: cudaMemcpyAsync(queryBuffer, cameraPoseBuffer)
    VkCpp->>CUDA: cudaStreamSynchronize()

    alt kHostRoundtrip (LUDUS_VK_DIRECT_IMPORT not set)
        VkCpp->>CUDA: cudaMemcpy DeviceToHost (all SSBOs)
        VkCpp->>Vk: vkCmdUpdateBuffer (re-injects via Vulkan transfer path)
    else Direct import
        VkCpp->>Vk: VkBufferMemoryBarrier (VK_QUEUE_FAMILY_EXTERNAL to graphics)
    end

    VkCpp->>Vk: vkCmdBeginRenderPass
    VkCpp->>Vk: vkCmdDrawMeshTasksEXT (polyline / polygon / obstacle)
    VkCpp->>Vk: vkCmdEndRenderPass then vkQueueSubmit then vkWaitForFences

    Cpp->>VkCpp: ludusCopyBatchResultsVk()
    VkCpp->>Vk: vkCmdCopyImageToBuffer (colorImage to readbackBuffer)
    VkCpp->>CUDA: cudaMemcpyAsync(output, readbackBuffer.cuDevPtr, DeviceToDevice)
    Cpp-->>Py: NHWC uint8 tensor

_{Reviews (8): Last reviewed commit: "Track true per-pool varray max in Vulkan..." | Re-trigger Greptile}

…dContext) Adds a parallel rendering backend that mirrors the public API of the existing CUDA software rasterizer. The new path uses VK_EXT_mesh_shader with CUDA-Vulkan external-memory interop so render uploads stay on the GPU, and is selected at construction via LudusTimestampedContext while LudusCudaTimestampedContext remains the default everywhere. New: Vulkan context (vkutil), pipelines for polyline/polygon/obstacle mesh+task+fragment shaders, NV->EXT GLSL converter and SPIR-V embed scripts, JIT plugin, Python context wrapper, and a CUDA-vs-Vulkan example/parity test. Multi-pool task shaders use a force_zero_tasks flag to keep over-dispatched workgroups' SSBO reads in-bounds, which is what prevents the giant cross-pool garbage triangles seen with the naive early-EmitMeshTasksEXT(0) pattern.

wlewNV · 2026-06-09T02:15:54Z

Marking as ready for review to get fresh comments from greptile.

Review fixes (greptile): - remove_scene: write the tombstone with cudaMemcpyHostToDevice + stream sync instead of cudaMemcpyDeviceToDevice on a host stack pointer (P0) - MSAA: rebuild the render pass and pipelines when the sample count changes so the framebuffer attachment counts match (P1) - gate the device-info log behind VK_DBG; re-read LUDUS_VK_DIRECT_IMPORT on every render instead of caching it in a static bool (P2) Cleanup / redundancy: - replace the loguru dependency with a small stdlib-logging shim (_logging.py) - remove dead state (unused VkContext / LudusTimestampedVkState fields) and the inert VkCudaSync semaphore subsystem - share VK_CHECK / VK_DBG via vkutil.h instead of duplicating them per file - delete the orphaned ts_common.glsl and the one-time nv_to_ext.py migration; the committed shaders are maintained directly as GL_EXT_mesh_shader - add render_to_staging device/contiguity checks and a removeScene device guard - vectorize render_batch query packing to avoid per-query host/device syncs Examples / packaging: - compare_vulkan_vs_cuda.py: render clipgt USDZ scenes (extract clipgt/*.parquet) with scene-cache auto-discovery and an available-camera fallback - stop tracking/shipping the intermediate .spv (regenerated by compile.sh and embedded in shaders_spv.h); ignore _vk_compare/ render output The CUDA backend is unchanged and remains the default.

upload_scene packed every scene with buffer-base offsets of 0, so the task shaders' scene.<buf>_offset + pool.<buf>_offset indexing made scene 2+ read scene 0's pools / timestamps / vertices. Carry persistent global cursors that mirror the C++ *Used counters: seed each scene descriptor's base offsets from them, advance them per upload_scene, and reset them in clear_scenes. Single- scene rendering is unchanged (all bases are 0 for the first scene). Also tighten the verbose comments added in the previous commit.

wlewNV · 2026-06-09T04:28:27Z

/ok to test 1681b99

…135) greptile re-review (3/5) flagged two issues in ludus_timestamped_vk.cpp: - updateDescriptorSet (which calls vkResetDescriptorPool + reallocates the set) was invoked after vkCmdBeginRenderPass, which the Vulkan spec forbids. Move it before the render pass begins. - The kHostRoundtrip path copied every CUDA-imported SSBO host->device on every render, even buffers unchanged since upload_scene. Add a sceneBuffersDirty flag set by the control-plane ops (upload scene/cameras/palette, remove, clear) so the scene/camera/palette buffers are re-pushed only when they actually change; the per-query buffers (query/cameraPose) still roundtrip every frame.

wlewNV · 2026-06-09T15:49:27Z

/ok to test ca140e6

…port - Add --ludus-backend {cuda,vulkan} (default cuda), threaded through RasterConfig into the interactive-drive rasterizer so both the raster and world-model backends can render the HD map via LudusTimestampedContext. - Fix createExternalImage: a Vulkan image with arrayLayers > 1 is layered, so the CUDA import must set CUDA_ARRAY3D_LAYERED; without it multi-frame/batched rendering failed with cuExternalMemoryGetMappedMipmappedArray INVALID_VALUE.

wlewNV · 2026-06-09T18:32:35Z

/ok to test 65e6bb1

gtong-nv · 2026-06-09T20:22:52Z

for the first run in a new setup, I got

  File "/home/horde/github/flashdreams/integrations/omnidreams/ludus-renderer/ludus_renderer/_ops/_plugin_vk.py", line 98, in _get_vk_plugin
    raise RuntimeError(f"Vulkan backend unavailable: {msg}")
RuntimeError: Vulkan backend unavailable: Vulkan headers not found. Install libvulkan-dev (Debian/Ubuntu) or set VULKAN_SDK to the Vulkan SDK root.

solved by installing libvulkan-dev. But other than that, it works fine in a machine with graphics capability

The `ruff format` pre-commit hook (CI "Run linter checks") reflowed the --ludus-backend help string and the ctx_cls ternary. Pure formatting; no behavior change.

wlewNV · 2026-06-09T20:33:39Z

/ok to test 2b63821

greptile re-review (3/5) flagged a hardcoded MAX_VARRAYS_PER_POOL=1000 that drove both the mesh-task dispatch count and the u_max_varrays_per_pool push constant, so any polyline/polygon pool with >1000 varrays at a timestamp had its tail silently dropped (task shader never invoked for those workgroup IDs). - context_vk.py: compute max_varrays_per_ts_{polyline,polygon} from the timestamped prefix sums (mirrors the CUDA backend) and pass them to upload. - thread the two values through the binding into ludusUploadSceneVk; track the running max in LudusTimestampedVkState (reset in clear_scenes) like the other per-scene maxima. - drive the dispatch stride and push constant from the tracked per-family max instead of the constant 1000. Also gate the "[Vulkan] Context ready" line behind VK_DBG so context creation is silent in production (the device/API line was already gated). Verified: a single polyline pool with 1200 varrays now renders at CUDA<->Vulkan parity (vk/cuda lit-pixel ratio 0.998); dot/polyline scenes unchanged.

wlewNV · 2026-06-09T21:17:18Z

/ok to test 9b81d3f

dgkoch · 2026-06-15T20:29:38Z

+
+    VkPhysicalDeviceFeatures2 features2 = {VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2};
+    features2.features.multiDrawIndirect = VK_TRUE;
+    features2.features.fillModeNonSolid = VK_TRUE;


Renderer SPIR-V declares OpCapability Int64, but this device-creation chain never enables VkPhysicalDeviceFeatures::shaderInt64. The validation layer flags it as VUID-VkShaderModuleCreateInfo-pCode-08740 on every shader module create with an Int64 op:

▎ vkCreateShaderModule(): SPIR-V Capability Int64 was declared, but one of the following requirements is required (VkPhysicalDeviceFeatures::shaderInt64).

Suggested fix — one line:

Suggested change

features2.features.fillModeNonSolid = VK_TRUE;

features2.features.fillModeNonSolid = VK_TRUE;

features2.features.shaderInt64 = VK_TRUE;

Probably also worth querying vkGetPhysicalDeviceFeatures2 first and bailing out (or skipping the Int64 shader paths) if the physical device does not report shaderInt64 support, instead of blindly requesting it.

greptile-apps Bot reviewed Jun 5, 2026

View reviewed changes

wlewNV added 2 commits June 8, 2026 17:04

Apply Mukund's suggestions

cea0dd0

wlewNV force-pushed the dev/wlew/ludus_vk branch from 67d3b69 to cea0dd0 Compare June 9, 2026 00:06

wlewNV marked this pull request as ready for review June 9, 2026 02:15

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread integrations/omnidreams/ludus-renderer/ludus_renderer/_ops/context_vk.py

Comment thread integrations/omnidreams/ludus-renderer/ludus_renderer/_cpp/render/ludus_timestamped_vk.cpp

wlewNV added 2 commits June 8, 2026 19:41

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread integrations/omnidreams/ludus-renderer/ludus_renderer/_cpp/render/ludus_timestamped_vk.cpp

wlewNV changed the title ~~feat(ludus-renderer): add Vulkan mesh-shader backend (LudusTimestampedContext)~~ [ludus-renderer]: add Vulkan mesh-shader backend (LudusTimestampedContext) Jun 9, 2026

Apply ruff format to interactive-drive backend selection (PR #135 CI)

2b63821

The `ruff format` pre-commit hook (CI "Run linter checks") reflowed the --ludus-backend help string and the ctx_cls ternary. Pure formatting; no behavior change.

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread integrations/omnidreams/ludus-renderer/ludus_renderer/_cpp/render/ludus_timestamped_vk.cpp

dgkoch reviewed Jun 15, 2026

View reviewed changes

	features2.features.fillModeNonSolid = VK_TRUE;
	features2.features.fillModeNonSolid = VK_TRUE;
	features2.features.shaderInt64 = VK_TRUE;

Conversation

wlewNV commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in it

Testing

Caveats

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

greptile-apps Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wlewNV commented Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

wlewNV commented Jun 9, 2026

Uh oh!

Uh oh!

wlewNV commented Jun 9, 2026

Uh oh!

wlewNV commented Jun 9, 2026

Uh oh!

gtong-nv commented Jun 9, 2026

Uh oh!

wlewNV commented Jun 9, 2026

Uh oh!

Uh oh!

wlewNV commented Jun 9, 2026

Uh oh!

dgkoch Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wlewNV commented May 21, 2026 •

edited

Loading

greptile-apps Bot commented Jun 5, 2026 •

edited

Loading