Add AMD GPU (ROCm/HIP) support to the Caspar backend by jeffdaily · Pull Request #465 · symforce-org/symforce

jeffdaily · 2026-06-16T18:19:19Z

This PR adds AMD GPU support to Caspar so its generated kernels and runtime build and run on AMD GPUs through ROCm/HIP, while keeping the default NVIDIA CUDA build unchanged. It is enabled with compile_caspar_library(..., use_hip=True, hip_arch=...) (or -DUSE_HIP=ON in the generated build); when off, the build is exactly as before.

The CUDA spellings Caspar emits and uses -- cudaMalloc, __syncthreads, cooperative-group reductions, CUB primitives, and the runtime API -- are mapped to their HIP equivalents through a small compatibility header (source/runtime/cuda_to_hip.h). On an NVIDIA build the header is a transparent passthrough; on a ROCm build it aliases the cuda* symbols to hip* and supplies device-side fallbacks where HIP lacks a cooperative-groups primitive (cg::reduce, cg::labeled_partition). Because the mapping lives in one header and the codegen templates emit it, the symbolic kernel definitions are unchanged.

The code generation gains a HIP path: code_generation/library.py takes use_hip/hip_arch, and the Jinja build-file and kernel templates emit the HIP-correct includes and the USE_HIP CMake option. The runtime sources get the corresponding HIP-compat (shared-memory atomics, the reduction fallback, and a host-side pointer-attribute lookup in the pybind layer). A small Windows build fix (C++17, git path separator) is included for the all-clang ROCm toolchain.

How to build for AMD GPUs

Pass use_hip=True and the target architecture when compiling a generated library:

compile_caspar_library(caslib, output_dir, use_hip=True, hip_arch="gfx90a")

Set hip_arch to the target AMD GPU (for example gfx90a for CDNA2 or gfx1100 for RDNA3). The ROCm build needs a HIP-enabled compiler (hipcc/amdclang++) and the hip and hipcub packages.

Validation

The full code-generation pipeline (generate, HIP-compile, execute on GPU, verify numerical output) was exercised on real AMD hardware: Linux CDNA2 (gfx90a) and RDNA3 (gfx1100), and Windows RDNA4 (gfx1201). The generated kernels (cooperative-group reductions, shared-memory atomics, scatter/gather) produce results matching the CUDA path. The default NVIDIA CUDA build is unchanged.

Adds AMD GPU support to Caspar so its generated kernels and runtime also build and run on AMD GPUs via HIP/ROCm, while leaving the default NVIDIA CUDA build unchanged (enabled with use_hip=True / -DUSE_HIP=ON; off by default). Review in this order: 1. symforce/caspar/source/runtime/cuda_to_hip.h (new): a compatibility header that maps the CUDA spellings Caspar emits and uses (cudaMalloc, __syncthreads, cooperative-group reductions, CUB primitives, the runtime API) onto their HIP equivalents, and supplies device-side fallbacks where HIP lacks a cg:: primitive (reduce, labeled_partition). 2. code_generation/library.py and source/templates/*.jinja: the codegen gains a use_hip/hip_arch path; the generated build file and kernel templates emit the compat include and HIP-correct spellings, so the symbolic kernel definitions are unchanged. 3. source/runtime/*.cu, memops.cuh, pybind_array_tools.{cc,h}: the runtime HIP-compat (shared-memory atomics, the cooperative-group reduction fallback, and a host-side pointer-attribute lookup). 4. A small Windows build fix (C++17, git path separator) for the all-clang ROCm toolchain. 5. README: how to build a generated library for AMD GPUs. Authored with assistance from Claude. Test Plan: The full code-generation pipeline (generate -> HIP-compile -> execute on GPU -> verify numerical output) was exercised on real AMD hardware, Linux CDNA2 (gfx90a) and RDNA3 (gfx1100) and Windows RDNA4 (gfx1201): compile_caspar_library(caslib, out_dir, use_hip=True, hip_arch="gfx90a") # generated kernel executes on GPU; output matches the CUDA path. The default NVIDIA CUDA build (use_hip=False) is unchanged.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AMD GPU (ROCm/HIP) support to the Caspar backend#465

Add AMD GPU (ROCm/HIP) support to the Caspar backend#465
jeffdaily wants to merge 1 commit into
symforce-org:mainfrom
jeffdaily:moat-port

jeffdaily commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeffdaily commented Jun 16, 2026

How to build for AMD GPUs

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant