fix: prefer NVML v2 memory info for inference setup#41
Open
Rohithmatham12 wants to merge 3 commits into
Open
Conversation
lfengad
previously approved these changes
Jun 15, 2026
Author
|
Pushed a test-only follow-up for the unittest failure. CI's pynvml build does not expose nvmlDeviceGetMemoryInfo_v2, so the regression test now monkeypatches that symbol with raising=False before exercising the v2-preferred path.\n\nLocal validation after the change:\n- python3 -m py_compile cosmos_framework/inference/args.py cosmos_framework/inference/args_test.py\n- git diff --check\n\nI also tried the targeted pytest locally, but this environment does not have pytest installed. |
Author
|
Sorry, the approval became stale because I pushed a one-line test-only follow-up to fix the failing unittest on CI. The implementation code is unchanged from the approved version. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pynvml.nvmlDeviceGetMemoryInfo_v2()before the legacy v1 memory-info API in inference setupnvmlDeviceGetMemoryInfo()when v2 is unavailablenvmlShutdown()runs even when NVML probing failsNVMLError_NotSupported, plus the older-pynvml fallback pathWhy
DGX Spark / GB10 platforms can report
pynvml.NVMLError_NotSupportedfrom the legacy v1nvmlDeviceGetMemoryInfo()call during Cosmos3 inference setup. The v2 NVML memory-info API is the supported path there, while older environments still need the v1 fallback.Testing
python3 -m py_compile cosmos_framework/inference/args.py cosmos_framework/inference/args_test.pygit diff --checkNot run locally:
python3 -m pytest cosmos_framework/inference/args_test.py -qbecause this local environment does not have pytest installedRelated to NVIDIA/cosmos#180