Problem
The GH200 nightly-standard row re-enabled by #2296 now reaches the cuda.bindings test suite, but tests/test_cufile.py::test_get_bar_size_in_kb fails on the GH200 runner.
Failure: https://github.com/NVIDIA/cuda-python/actions/runs/28600139877/job/84805962749?pr=2296#step:33:582
Environment:
- Linux aarch64
- Python 3.14
- CUDA Toolkit 13.3.0
- NVIDIA GH200 480GB
- Runner group
nv-gpu-arm64-gh200-1gpu
Failure
tests/test_cufile.py::test_get_bar_size_in_kb FAILED
bar_size_kb = cufile.get_bar_size_in_kb(0)
cuda.bindings.cufile.cuFileError:
SUCCESS (0): cufile success; CUDA status: CUDA_ERROR_NOT_SUPPORTED (801)
The remainder of the bindings suite completed with 419 passed and 22 skipped; this was the only failure.
Assessment
GH200 connects the Grace CPU and Hopper GPU through NVLink-C2C rather than using PCIe as the CPU-GPU data path. cuFile also has a C2C P2P mode, so this does not imply that cuFile/GDS is generally unsupported on GH200.
The likely narrower explanation is that GH200 does not expose or use the GPU BAR aperture that cuFileGetBARSizeInKB expects, making this specific query inapplicable. The platform still has PCIe I/O, and the runner reports a PCI-style GPU bus ID (00000000:FD:00.0), so "GH200 has no PCIe support" would be too broad.
References:
One uncertainty remains: the cuFileGetBARSizeInKB API reference does not document CUDA_ERROR_NOT_SUPPORTED as a return for this function, so we should confirm whether the observed result is expected platform behavior or a cuFile/documentation gap.
Expected outcome
- Confirm whether
CUDA_ERROR_NOT_SUPPORTED is expected from cuFileGetBARSizeInKB on GH200/C2C systems.
- If expected, update
test_get_bar_size_in_kb to skip or xfail only for this supported "no applicable BAR aperture" result instead of failing the full bindings suite.
- If unexpected, follow up with cuFile and update the test once the intended API behavior is known.
Problem
The GH200
nightly-standardrow re-enabled by #2296 now reaches thecuda.bindingstest suite, buttests/test_cufile.py::test_get_bar_size_in_kbfails on the GH200 runner.Failure: https://github.com/NVIDIA/cuda-python/actions/runs/28600139877/job/84805962749?pr=2296#step:33:582
Environment:
nv-gpu-arm64-gh200-1gpuFailure
The remainder of the bindings suite completed with 419 passed and 22 skipped; this was the only failure.
Assessment
GH200 connects the Grace CPU and Hopper GPU through NVLink-C2C rather than using PCIe as the CPU-GPU data path. cuFile also has a C2C P2P mode, so this does not imply that cuFile/GDS is generally unsupported on GH200.
The likely narrower explanation is that GH200 does not expose or use the GPU BAR aperture that
cuFileGetBARSizeInKBexpects, making this specific query inapplicable. The platform still has PCIe I/O, and the runner reports a PCI-style GPU bus ID (00000000:FD:00.0), so "GH200 has no PCIe support" would be too broad.References:
cuFileGetBARSizeInKB.One uncertainty remains: the
cuFileGetBARSizeInKBAPI reference does not documentCUDA_ERROR_NOT_SUPPORTEDas a return for this function, so we should confirm whether the observed result is expected platform behavior or a cuFile/documentation gap.Expected outcome
CUDA_ERROR_NOT_SUPPORTEDis expected fromcuFileGetBARSizeInKBon GH200/C2C systems.test_get_bar_size_in_kbto skip or xfail only for this supported "no applicable BAR aperture" result instead of failing the full bindings suite.