Skip to content

CUDA.jl v5 compatibility and precompilation fixes#31

Open
nicolamos wants to merge 9 commits into
pc2:mainfrom
nicolamos:fix-cuda-wrapper-calls
Open

CUDA.jl v5 compatibility and precompilation fixes#31
nicolamos wants to merge 9 commits into
pc2:mainfrom
nicolamos:fix-cuda-wrapper-calls

Conversation

@nicolamos

Copy link
Copy Markdown
Contributor

PR: CUDA.jl v5 compatibility and precompilation fixes

Branch: fix-cuda-wrapper-callspc2/GPUInspector.jl main
Upstream: https://github.com/pc2/GPUInspector.jl


Summary

Fixes breakages when loading GPUInspector with CUDA.jl v5: renamed NVML probe wrappers, deprecated device memory API, duplicate method definitions that block precompilation, and broken savefig_monitoring_results dispatch in the CairoMakie extension.


Problem

After updating to CUDA.jl v5, GPUInspector fails at precompile or runtime:

  • UndefVarError on NVML.unsafe_nvml* (renamed upstream in CUDA.jl)
  • Deprecation/error on CUDA.total_memory() in gpus()
  • Method overwrite errors: clear_gpu_memory / clear_all_gpus_memory defined in both stubs and CUDAExt/utility.jl
  • savefig_monitoring_results: infinite recursion on multi-symbol calls; stub/extension signature clash

Key Changes

1. NVML capability probes (ext/CUDAExt/cuda_wrappers.jl)

Use NVML.unchecked_nvmlDeviceGetTemperature, unchecked_nvmlDeviceGetPowerUsage, unchecked_nvmlDeviceGetUtilizationRates instead of removed unsafe_nvml* names.

2. Device memory query (ext/CUDAExt/implementations/gpuinfo.jl)

Replace deprecated CUDA.total_memory() with CUDA.totalmem(dev) in gpus().

3. Precompilation — memory cleanup (ext/CUDAExt/utility.jl)

Remove duplicate clear_gpu_memory / clear_all_gpus_memory from CUDAExt (already provided via stubs); avoids method overwrite during precompile.

4. CairoMakie extension — savefig_monitoring_results (ext/CairoMakieExt.jl, stubs)

  • Split single-symbol vs multi-symbol dispatch (fixes StackOverflow)
  • Generic stub signatures to avoid overwrite with extension methods
  • Optional filename and prefix keyword arguments
  • Tiled summary figure when saving all monitoring metrics to one file

5. Minor

CUDAExt.jl, implementations/general.jl, stubs_general.jl, stubs_monitoring.jl, Project.toml compat adjustments.


Files Changed

 Project.toml
 ext/CUDAExt/CUDAExt.jl
 ext/CUDAExt/cuda_wrappers.jl
 ext/CUDAExt/implementations/general.jl
 ext/CUDAExt/implementations/gpuinfo.jl
 ext/CUDAExt/utility.jl
 ext/CairoMakieExt.jl
 src/GPUInspector.jl
 src/stubs/stubs_general.jl
 src/stubs/stubs_monitoring.jl

10 files, +118 / −48 lines vs main. No merge conflicts with upstream main.


Test plan

# CPU-only / precompile
julia --project -e 'using Pkg; Pkg.instantiate(); using GPUInspector'

# With CUDA + GPU (if available)
julia --project -e 'using GPUInspector; using CUDA; CUDA.functional() && GPUInspector.gpus()'

# CairoMakie extension (monitoring plots)
julia --project -e '
  using GPUInspector, CUDA, CairoMakie
  # run a short monitoring session and call savefig_monitoring_results on the result
'
  • Package precompiles without method overwrite errors
  • gpus() works with CUDA.jl v5
  • NVML telemetry probes (supports_get_temperature, etc.) do not throw on capable hardware
  • savefig_monitoring_results(r, :temperature) and multi-symbol form complete without stack overflow
  • Existing tests pass (Pkg.test())

Open PR

https://github.com/pc2/GPUInspector.jl/compare/main...nicolamos:GPUInspector.jl:fix-cuda-wrapper-calls

Suggested PR title: Fix CUDA.jl v5 compatibility (NVML wrappers, totalmem, precompile, savefig dispatch)

@nicolamos nicolamos marked this pull request as ready for review May 22, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant