GH200: test_get_bar_size_in_kb fails with CUDA_ERROR_NOT_SUPPORTED

## Problem

The GH200 `nightly-standard` row re-enabled by #2296 now reaches the `cuda.bindings` test suite, but `tests/test_cufile.py::test_get_bar_size_in_kb` fails on the GH200 runner.

Failure: https://github.com/NVIDIA/cuda-python/actions/runs/28600139877/job/84805962749?pr=2296#step:33:582

Environment:

- Linux aarch64
- Python 3.14
- CUDA Toolkit 13.3.0
- NVIDIA GH200 480GB
- Runner group `nv-gpu-arm64-gh200-1gpu`

## Failure

```text
tests/test_cufile.py::test_get_bar_size_in_kb FAILED

bar_size_kb = cufile.get_bar_size_in_kb(0)

cuda.bindings.cufile.cuFileError:
SUCCESS (0): cufile success; CUDA status: CUDA_ERROR_NOT_SUPPORTED (801)
```

The remainder of the bindings suite completed with 419 passed and 22 skipped; this was the only failure.

## Assessment

GH200 connects the Grace CPU and Hopper GPU through NVLink-C2C rather than using PCIe as the CPU-GPU data path. cuFile also has a C2C P2P mode, so this does **not** imply that cuFile/GDS is generally unsupported on GH200.

The likely narrower explanation is that GH200 does not expose or use the GPU BAR aperture that `cuFileGetBARSizeInKB` expects, making this specific query inapplicable. The platform still has PCIe I/O, and the runner reports a PCI-style GPU bus ID (`00000000:FD:00.0`), so "GH200 has no PCIe support" would be too broad.

References:

- [NVIDIA Grace performance guide](https://docs.nvidia.com/dccpu/grace-perf-tuning-guide/index.html) describes the Grace-Hopper CPU/GPU connection as NVLink-C2C and separately lists PCIe I/O.
- [cuFile API reference](https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html#cufilegetbarsizeinkb) defines `cuFileGetBARSizeInKB`.
- The same [cuFile reference](https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html#cufiledriversetp2pflags) documents distinct PCI P2PDMA and C2C P2P modes.

One uncertainty remains: the `cuFileGetBARSizeInKB` API reference does not document `CUDA_ERROR_NOT_SUPPORTED` as a return for this function, so we should confirm whether the observed result is expected platform behavior or a cuFile/documentation gap.

## Expected outcome

- Confirm whether `CUDA_ERROR_NOT_SUPPORTED` is expected from `cuFileGetBARSizeInKB` on GH200/C2C systems.
- If expected, update `test_get_bar_size_in_kb` to skip or xfail only for this supported "no applicable BAR aperture" result instead of failing the full bindings suite.
- If unexpected, follow up with cuFile and update the test once the intended API behavior is known.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GH200: test_get_bar_size_in_kb fails with CUDA_ERROR_NOT_SUPPORTED #2299

Problem

Failure

Assessment

Expected outcome

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

GH200: test_get_bar_size_in_kb fails with CUDA_ERROR_NOT_SUPPORTED #2299

Description

Problem

Failure

Assessment

Expected outcome

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions