Add CD HF inference asset verification by jmccaffrey-nv · Pull Request #120 · NVIDIA/flashdreams

jmccaffrey-nv · 2026-05-20T04:33:10Z

Summary

add a CD-only Hugging Face asset verification for README inference instructions
run the verification through the prebuilt Docker image with persistent caches
authenticate to Hugging Face and verify Alpadreams + Wan2.1 I2V HF assets without adding the network-dependent check to regular CI

Testing

python3 -m py_compile flashdreams/tests/test_readme_hf_inference_assets.py
python3 flashdreams/tests/test_readme_hf_inference_assets.py (skips unless FLASHDREAMS_HF_ACCESS_TEST=1)
git diff --cached --check

Note: local ruff was not run because this machine has uv 0.11.7 while the repo requires >=0.11.8.

copy-pr-bot · 2026-05-20T04:33:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

liruilong940607 · 2026-05-20T16:42:50Z

/ok to test 17ed805

The optional-import fallback used mypy/pyright's `# type: ignore[assignment]` syntax, which ty does not recognize. Rename the import binding to `_pytest` and use the ty-native `# ty:ignore[invalid-assignment]` directive. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

jmccaffrey-nv · 2026-05-21T19:48:09Z

/ok to test 511e714

Replace the docker-based test driver with a standalone check script and wire it into both CI and CD via a uniform in-venv invocation. - Extract the previously-embedded "_inner_script" heredoc into `flashdreams/tests/check_hf_inference_assets.py`. The script resolves every HF asset referenced by the README's headline recipes (`omnidreams-sv-2steps-chunk2-loc6-lightvae-lighttae-perf` and `wan21-i2v-14b-480p`) and downloads one small file per unique repo, so gated-access and token-scope regressions surface as failed downloads rather than visibility-only false negatives. - Delete the old docker driver. The prebuilt base image that drove its default never ships publicly; only the Dockerfile recipe on top of a CUDA base will. Pinning that internal image in source-of-truth code was misleading. - CI: add a single step in the existing cpu job that runs the script when HF_TOKEN is available; skip with a warning otherwise so external fork PRs do not block. - CD: replace the docker-run job with the same script, executed inside the public `nvidia/cuda` container. Drops four FLASHDREAMS_* cache-dir env vars and the FLASHDREAMS_HF_ACCESS_TEST enable flag (only HF_TOKEN remains). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

greptile-apps · 2026-06-05T01:54:31Z

Greptile Summary

This PR adds a Hugging Face asset reachability check that runs authenticated hf_hub_download calls against the two README headline recipes (omnidreams-sv-2steps-chunk2-loc6-lightvae-lighttae-perf and wan21-i2v-14b-480p), surfacing gated-repo or token-scope regressions before they reach users.

cd.yml (new): Dedicated CD job that enforces the HF token and runs the check unconditionally on every push to main and on tags; the token validation step is currently placed after all dependency installation, meaning a missing secret only fails the job after several minutes of setup work.
ci.yml (modified): Adds the same check as an optional step in the existing CPU job, gracefully skipping with a warning when HF_TOKEN is absent (e.g. external fork PRs) — token correctly scoped to the step only.
check_hf_inference_assets.py (new): Walks pipeline component attrs to collect HF URLs/bare repo IDs, deduplicates by repo_id@revision, and downloads a small probe file per repo; the URI-scheme guard that routes bare HF IDs is missing hf:// and gs:// prefixes.

Confidence Score: 3/5

Safe to merge after the token-check ordering in cd.yml is fixed; a missing HF_TOKEN secret currently wastes minutes of setup before the job fails.

The CD workflow validates the HF token only after running apt-get and a full uv sync, so any CD environment that lacks the secret will burn several minutes of runner time before erroring out. The verification script also has a URI-scheme gap that could surface as a confusing HF API error if future pipeline configs introduce hf:// or gs:// paths. The ci.yml change is clean and handles the no-token case correctly.

.github/workflows/cd.yml warrants a second look for step ordering and secret scoping; flashdreams/tests/check_hf_inference_assets.py is otherwise solid but the URI heuristic and AssertionError use are worth addressing.

Important Files Changed

Filename	Overview
.github/workflows/cd.yml	New CD workflow for HF asset verification; HF_TOKEN validation is ordered after expensive dependency installation (fail-fast concern), and the token is unnecessarily exposed at job level rather than scoped to the verification step.
.github/workflows/ci.yml	Adds optional HF inference asset check to the CPU job; gracefully skips when HF_TOKEN is absent, token is correctly scoped to the step only.
flashdreams/tests/check_hf_inference_assets.py	New script that walks pipeline component attrs, resolves HF URLs/repo IDs, and downloads small probe files to verify authenticated access; non-HF URI scheme guard is incomplete (misses hf:// and gs://) and the failure path raises AssertionError instead of RuntimeError.

Sequence Diagram

sequenceDiagram
    participant GH as GitHub Actions
    participant CD as cd.yml job
    participant CI as ci.yml cpu job
    participant Script as check_hf_inference_assets.py
    participant HF as Hugging Face Hub

    GH->>CD: push to main / tag / workflow_dispatch
    CD->>CD: apt-get + uv sync (expensive setup)
    CD->>CD: Verify HF_TOKEN set (currently last)
    CD->>Script: uv run python check_hf_inference_assets.py
    Script->>Script: Load OMNIDREAMS_RUNNERS + RUNNER_WAN21_I2V_14B_480P
    Script->>Script: _pipeline_urls() — collect component attrs
    loop per (repo_id, revision) not yet seen
        Script->>HF: "hf_hub_download(probe file, token=HF_TOKEN)"
        HF-->>Script: local cache path
    end
    Script-->>CD: exit 0 (pass) or exit 1 (fail)

    GH->>CI: push / merge_group
    CI->>CI: lint + CPU unit tests
    alt HF_TOKEN present
        CI->>Script: uv run python check_hf_inference_assets.py
        Script->>HF: authenticated probe downloads
        HF-->>Script: ok
        Script-->>CI: exit 0
    else HF_TOKEN absent (fork PR)
        CI->>CI: warning: skipping — exit 0
    end

_{Reviews (1): Last reviewed commit: "Refactor HF inference asset verification" | Re-trigger Greptile}

greptile-apps · 2026-06-05T01:54:34Z

+      - name: Install workspace dependencies
+        env:
+          BLOCK_SPARSE_ATTN_SKIP_CUDA_BUILD: "TRUE"
+        run: |
+          uv venv --clear
+          uv sync --extra runners --no-install-package transformer-engine-torch
+
+      - name: Verify HF token is configured
+        run: |
+          test -n "${HF_TOKEN}" || {
+            echo "::error::HF_TOKEN secret is required for CD HF asset verification"
+            exit 1
+          }
+
+      - name: Verify HF inference assets
+        run: uv run --no-sync python flashdreams/tests/check_hf_inference_assets.py


The "Verify HF token is configured" step sits after all the expensive dependency-installation steps. If HF_TOKEN is missing from the CD secrets, the job will fail only after spending minutes on apt-get, uv venv, and uv sync. Moving the check to the first step (or at least before "Install workspace dependencies") fails fast and avoids wasting runner time.

Suggested change

- name: Install workspace dependencies

env:

BLOCK_SPARSE_ATTN_SKIP_CUDA_BUILD: "TRUE"

run: |

uv venv --clear

uv sync --extra runners --no-install-package transformer-engine-torch

- name: Verify HF token is configured

run: |

test -n "${HF_TOKEN}" || {

echo "::error::HF_TOKEN secret is required for CD HF asset verification"

exit 1

}

- name: Verify HF inference assets

run: uv run --no-sync python flashdreams/tests/check_hf_inference_assets.py

- name: Verify HF token is configured

run: |

test -n "${HF_TOKEN}" || {

echo "::error::HF_TOKEN secret is required for CD HF asset verification"

exit 1

}

- name: Install workspace dependencies

env:

BLOCK_SPARSE_ATTN_SKIP_CUDA_BUILD: "TRUE"

run: |

uv venv --clear

uv sync --extra runners --no-install-package transformer-engine-torch

- name: Verify HF inference assets

run: uv run --no-sync python flashdreams/tests/check_hf_inference_assets.py

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-05T01:54:35Z

+    env:
+      UV_PROJECT_ENVIRONMENT: /tmp/flashdreams-venv
+      UV_LINK_MODE: copy
+      HF_TOKEN: ${{ secrets.HF_TOKEN }}


The HF_TOKEN is set at job-level env, so it's injected into every step — including Checkout, Setup uv, and apt-get — though it's only needed by the final verification step. Following the pattern in ci.yml (step-scoped env) reduces the blast radius if any earlier step unexpectedly logs its environment.

Suggested change

env:

UV_PROJECT_ENVIRONMENT: /tmp/flashdreams-venv

UV_LINK_MODE: copy

HF_TOKEN: ${{ secrets.HF_TOKEN }}

env:

UV_PROJECT_ENVIRONMENT: /tmp/flashdreams-venv

UV_LINK_MODE: copy

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-05T01:54:36Z

+            )
+        else:
+            path = _probe(repo_id, revision, token=token)
+    elif "/" in url_or_id and not url_or_id.startswith(("http", "s3", "/")):


The heuristic not url_or_id.startswith(("http", "s3", "/")) doesn't guard against hf:// or gs:// URIs. If any component attr ever contains an hf://org/model/… or gs://bucket/… path, the elif branch fires, the full URI is passed as repo_id to _probe(), and the HF API returns a confusing 404/repo-not-found error instead of the expected "skipped non-HF path" message.

Suggested change

elif "/" in url_or_id and not url_or_id.startswith(("http", "s3", "/")):

elif "/" in url_or_id and not url_or_id.startswith(("http", "s3", "gs", "hf", "/")):

greptile-apps · 2026-06-05T01:54:37Z

+    raise AssertionError(
+        f"no probe file ({', '.join(_PROBE_FILES)}) downloadable "
+        f"from {repo_id}@{revision}: {last}"
+    )


AssertionError is semantically reserved for programmer-error assertions and is silenced when Python runs with the -O flag. An expected failure path (none of the probe files exist in a valid HF repo) is better represented as a RuntimeError so it can't be accidentally suppressed.

Suggested change

raise AssertionError(

f"no probe file ({', '.join(_PROBE_FILES)}) downloadable "

f"from {repo_id}@{revision}: {last}"

)

raise RuntimeError(

f"no probe file ({', '.join(_PROBE_FILES)}) downloadable "

f"from {repo_id}@{revision}: {last}"

)

Add CD HF inference asset verification

88d8c1c

Merge branch 'main' into dev/jmccaffrey/hf-access-test

17ed805

jmccaffrey-nv marked this pull request as draft May 21, 2026 07:18

Merge branch 'main' into dev/jmccaffrey/hf-access-test

511e714

greptile-apps Bot reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CD HF inference asset verification#120

Add CD HF inference asset verification#120
jmccaffrey-nv wants to merge 5 commits into
mainfrom
dev/jmccaffrey/hf-access-test

jmccaffrey-nv commented May 20, 2026

Uh oh!

copy-pr-bot Bot commented May 20, 2026

Uh oh!

liruilong940607 commented May 20, 2026

Uh oh!

jmccaffrey-nv commented May 21, 2026

Uh oh!

greptile-apps Bot commented Jun 5, 2026

Uh oh!

greptile-apps Bot Jun 5, 2026

Uh oh!

greptile-apps Bot Jun 5, 2026

Uh oh!

greptile-apps Bot Jun 5, 2026

Uh oh!

greptile-apps Bot Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	elif "/" in url_or_id and not url_or_id.startswith(("http", "s3", "/")):
	elif "/" in url_or_id and not url_or_id.startswith(("http", "s3", "gs", "hf", "/")):

Conversation

jmccaffrey-nv commented May 20, 2026

Summary

Testing

Uh oh!

copy-pr-bot Bot commented May 20, 2026

Uh oh!

liruilong940607 commented May 20, 2026

Uh oh!

jmccaffrey-nv commented May 21, 2026

Uh oh!

greptile-apps Bot commented Jun 5, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants