fix: prefer NVML v2 memory info for inference setup by Rohithmatham12 · Pull Request #41 · NVIDIA/cosmos-framework

Rohithmatham12 · 2026-06-13T22:02:22Z

Summary

try pynvml.nvmlDeviceGetMemoryInfo_v2() before the legacy v1 memory-info API in inference setup
keep compatibility with older pynvml builds by falling back to nvmlDeviceGetMemoryInfo() when v2 is unavailable
keep the existing torch/default memory fallback and ensure nvmlShutdown() runs even when NVML probing fails
add unit coverage for v2 success when v1 would raise NVMLError_NotSupported, plus the older-pynvml fallback path

Why
DGX Spark / GB10 platforms can report pynvml.NVMLError_NotSupported from the legacy v1 nvmlDeviceGetMemoryInfo() call during Cosmos3 inference setup. The v2 NVML memory-info API is the supported path there, while older environments still need the v1 fallback.

Testing

python3 -m py_compile cosmos_framework/inference/args.py cosmos_framework/inference/args_test.py
git diff --check

Not run locally:

python3 -m pytest cosmos_framework/inference/args_test.py -q because this local environment does not have pytest installed
import smoke because this local environment is missing framework dependencies such as pydantic

Related to NVIDIA/cosmos#180

Rohithmatham12 · 2026-06-15T03:33:33Z

Pushed a test-only follow-up for the unittest failure. CI's pynvml build does not expose nvmlDeviceGetMemoryInfo_v2, so the regression test now monkeypatches that symbol with raising=False before exercising the v2-preferred path.\n\nLocal validation after the change:\n- python3 -m py_compile cosmos_framework/inference/args.py cosmos_framework/inference/args_test.py\n- git diff --check\n\nI also tried the targeted pytest locally, but this environment does not have pytest installed.

Rohithmatham12 · 2026-06-15T03:35:42Z

Sorry, the approval became stale because I pushed a one-line test-only follow-up to fix the failing unittest on CI. The implementation code is unchanged from the approved version.

fix: prefer NVML v2 memory info for inference setup

84bbc33

Rohithmatham12 mentioned this pull request Jun 13, 2026

‘pynvml.NVMLError_NotSupported: Not Supported’ error on DGX Spark NVIDIA/cosmos#180

Open

Merge branch 'main' into fix-nvml-v2-memory-info

c7b6af7

lfengad previously approved these changes Jun 15, 2026

View reviewed changes

test: allow missing NVML v2 symbol

9876dd6

Rohithmatham12 dismissed lfengad’s stale review via 9876dd6 June 15, 2026 03:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prefer NVML v2 memory info for inference setup#41

fix: prefer NVML v2 memory info for inference setup#41
Rohithmatham12 wants to merge 3 commits into
NVIDIA:mainfrom
Rohithmatham12:fix-nvml-v2-memory-info

Rohithmatham12 commented Jun 13, 2026

Uh oh!

Rohithmatham12 commented Jun 15, 2026

Uh oh!

Rohithmatham12 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Rohithmatham12 commented Jun 13, 2026

Uh oh!

Rohithmatham12 commented Jun 15, 2026

Uh oh!

Rohithmatham12 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants