test(omnidreams): same-seed bitwise-reproducibility GPU test + nightly CI by jmccaffrey-nv · Pull Request #324 · NVIDIA/flashdreams

jmccaffrey-nv · 2026-06-10T14:53:56Z

What

Adds a GPU test that pins bitwise reproducibility at the same seed for the OmniDreams distilled runner, plus a nightly CI job that runs it.

The test (integrations/omnidreams/tests/test_omnidreams_same_seed_reproducibility.py) runs the distilled runner (omnidreams-sv-2steps-chunk2-loc6-lightvae-lighttae) twice at the same seed under PyTorch's strict-determinism flags and asserts the two output MP4s are byte-identical (sha256):

CUBLAS_WORKSPACE_CONFIG=:4096:8
torch.use_deterministic_algorithms(True, warn_only=True)
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

(flags set in a fresh subprocess before the first CUDA context, per the PyTorch reproducibility notes)

Unlike a golden-hash regression, it compares the two runs against each other, so there is no committed digest to maintain and it stays green across toolchain/hardware changes.

How it's gated

Tier-marked ci_gpu but skips unless OMNIDREAMS_REPRO_RUN is set, so the per-PR pytest -m ci_gpu job collects-and-skips it instantly.
New nightly workflow .github/workflows/determinism.yml (cron + workflow_dispatch) sets OMNIDREAMS_REPRO_RUN=1 to actually run it on the GPU runner.

(We avoid the manual marker on purpose: pytest-manual-marker xfails every manual test at setup, so it would never execute as a CI guard.)

Verification

Ran on an H100 (slurm, cu130 container): two seed=1 rollouts produced an identical MP4 — 1 passed in 384.73s.

🤖 Generated with Claude Code

…ghtly CI Add a GPU test that runs the distilled omnidreams runner twice at the same seed under PyTorch's strict-determinism flags (torch.use_deterministic_algorithms(True, warn_only=True), CUBLAS_WORKSPACE_CONFIG=:4096:8, PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True) and asserts the two output MP4s are byte-identical (sha256). Unlike a golden-hash regression, it compares the two runs against each other, so it needs no committed digest and stays green across toolchain/hardware changes. The test carries the ci_gpu tier marker but skips unless OMNIDREAMS_REPRO_RUN is set, so the per-PR `pytest -m ci_gpu` job collects-and-skips it instantly. A new nightly workflow (.github/workflows/determinism.yml, schedule + workflow_dispatch) sets OMNIDREAMS_REPRO_RUN=1 to actually run it on the GPU runner. Verified passing on an H100 (slurm, cu130 container): two seed=1 rollouts produced an identical MP4 (1 passed in 384.73s). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

copy-pr-bot · 2026-06-10T14:54:00Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-06-10T14:58:30Z

Greptile Summary

Adds a nightly GPU job that verifies bitwise reproducibility of the OmniDreams distilled runner by running it twice at the same seed and asserting sha256 equality of the two output MP4s — no committed golden hash needed.

Test (test_omnidreams_same_seed_reproducibility.py): spawns two isolated subprocesses with CUBLAS_WORKSPACE_CONFIG, PYTORCH_CUDA_ALLOC_CONF, and torch.use_deterministic_algorithms(warn_only=True) set before the first CUDA context; compares the resulting MP4 digests with a detailed failure message.
Workflow (.github/workflows/determinism.yml): runs on a cron schedule (07:00 UTC) and workflow_dispatch; the test is gated by OMNIDREAMS_REPRO_RUN=1 so the per-PR ci_gpu suite collects and skips it instantly.

Confidence Score: 4/5

Safe to merge — the test logic and subprocess isolation are sound; the only issues are a stale comment in the workflow and a floating action ref.

The test correctly isolates each rollout in a fresh subprocess, sets all determinism knobs before the first CUDA context, and compares sha256 digests with clear failure diagnostics. The workflow header comment says the test is marked manual when it is actually marked ci_gpu with an env gate — a reader relying solely on that comment would misunderstand the gating scheme. The nv-gha-runners/setup-proxy-cache@main reference is mutable and could silently change what the job installs.

.github/workflows/determinism.yml — stale manual-marker comment and floating @main action ref.

Important Files Changed

Filename	Overview
.github/workflows/determinism.yml	New nightly CI job running the reproducibility test on an RTX Pro 6000 runner; one comment incorrectly states the test uses the `manual` marker (it uses `ci_gpu` + env gate); `setup-proxy-cache@main` is a floating ref.
integrations/omnidreams/tests/test_omnidreams_same_seed_reproducibility.py	Well-structured opt-in GPU test that spawns two isolated subprocesses with determinism env vars set before the first CUDA context, then compares sha256 of their output MP4s; error messages are informative.

Sequence Diagram

sequenceDiagram
    participant CI as Nightly CI (determinism.yml)
    participant Test as test_same_seed_is_bitwise_reproducible
    participant SubA as Subprocess A (python -c bootstrap)
    participant SubB as Subprocess B (python -c bootstrap)
    participant GPU as GPU / CUDA

    CI->>Test: "pytest (OMNIDREAMS_REPRO_RUN=1, seed=1)"
    Test->>SubA: spawn with CUBLAS_WORKSPACE_CONFIG + PYTORCH_CUDA_ALLOC_CONF
    SubA->>SubA: os.environ.setdefault CUBLAS_WORKSPACE_CONFIG
    SubA->>SubA: "torch.use_deterministic_algorithms(True, warn_only=True)"
    SubA->>GPU: "run distilled runner (seed=1, total_blocks=4)"
    GPU-->>SubA: inference output
    SubA-->>Test: "rc=0, run_a/recipe.mp4"
    Test->>SubB: "spawn identical env (seed=1)"
    SubB->>SubB: same determinism bootstrap
    SubB->>GPU: "run distilled runner (seed=1, total_blocks=4)"
    GPU-->>SubB: inference output
    SubB-->>Test: "rc=0, run_b/recipe.mp4"
    Test->>Test: "sha256(mp4_a) == sha256(mp4_b)?"
    alt byte-identical
        Test-->>CI: PASS
    else digest mismatch
        Test-->>CI: FAIL (diff hint + tmp paths retained)
    end

_{Reviews (1): Last reviewed commit: "test(omnidreams): add same-seed bitwise-..." | Re-trigger Greptile}

greptile-apps · 2026-06-10T14:58:35Z

+# The same-seed reproducibility test is marked `manual`: it needs a real GPU,
+# downloads the distilled checkpoint + an example HDMap clip from HF, and runs
+# two full rollouts, so it is too heavy for the per-PR `ci_gpu` job. Instead it
+# runs here on a nightly schedule (and on demand via the Actions "Run workflow"
+# button). Scheduled runs execute on the default branch only.


The header comment says the test is marked manual, but the test file uses pytestmark = pytest.mark.ci_gpu with an env-gate skip — not manual. The PR description even explains why manual was deliberately avoided (the pytest-manual-marker plugin would xfail it at setup). A reader consulting only this file would have a wrong mental model of why the nightly job is needed and how the per-PR gate works.

Suggested change

# The same-seed reproducibility test is marked `manual`: it needs a real GPU,

# downloads the distilled checkpoint + an example HDMap clip from HF, and runs

# two full rollouts, so it is too heavy for the per-PR `ci_gpu` job. Instead it

# runs here on a nightly schedule (and on demand via the Actions "Run workflow"

# button). Scheduled runs execute on the default branch only.

# The same-seed reproducibility test carries the `ci_gpu` marker but skips

# unless OMNIDREAMS_REPRO_RUN is set, so the per-PR `pytest -m ci_gpu` run

# collects it and skips it in milliseconds. This job sets that env var so the

# test actually executes. (The `manual` marker was deliberately avoided:

# pytest-manual-marker xfails every `manual` test at setup, so a `manual` test

# never executes in automation.) Scheduled runs target the default branch only.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-10T14:58:36Z

+          rm -rf /var/lib/apt/lists/*
+
+      - name: Setup proxy cache
+        uses: nv-gha-runners/setup-proxy-cache@main


nv-gha-runners/setup-proxy-cache@main pins to a floating branch head. Any future push to main in that repo changes what runs here without a diff in this file. For internal infra actions this is often intentional, but it's worth confirming whether pinning to a commit SHA or a version tag is feasible — if the action is updated in a breaking or unexpected way, this job will silently change behavior.

greptile-apps Bot reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(omnidreams): same-seed bitwise-reproducibility GPU test + nightly CI#324

test(omnidreams): same-seed bitwise-reproducibility GPU test + nightly CI#324
jmccaffrey-nv wants to merge 1 commit into
mainfrom
dev/jmccaffrey/omnidreams-same-seed-repro

jmccaffrey-nv commented Jun 10, 2026

Uh oh!

copy-pr-bot Bot commented Jun 10, 2026

Uh oh!

greptile-apps Bot commented Jun 10, 2026

Uh oh!

greptile-apps Bot Jun 10, 2026

Uh oh!

greptile-apps Bot Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-# The same-seed reproducibility test is marked `manual`: it needs a real GPU,
-# downloads the distilled checkpoint + an example HDMap clip from HF, and runs
-# two full rollouts, so it is too heavy for the per-PR `ci_gpu` job. Instead it
-# runs here on a nightly schedule (and on demand via the Actions "Run workflow"
-# button). Scheduled runs execute on the default branch only.
+# The same-seed reproducibility test carries the `ci_gpu` marker but skips
+# unless OMNIDREAMS_REPRO_RUN is set, so the per-PR `pytest -m ci_gpu` run
+# collects it and skips it in milliseconds. This job sets that env var so the
+# test actually executes. (The `manual` marker was deliberately avoided:
+# pytest-manual-marker xfails every `manual` test at setup, so a `manual` test
+# never executes in automation.) Scheduled runs target the default branch only.

Conversation

jmccaffrey-nv commented Jun 10, 2026

What

How it's gated

Verification

Uh oh!

copy-pr-bot Bot commented Jun 10, 2026

Uh oh!

greptile-apps Bot commented Jun 10, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant