Skip to content

test(omnidreams): same-seed bitwise-reproducibility GPU test + nightly CI#324

Open
jmccaffrey-nv wants to merge 1 commit into
mainfrom
dev/jmccaffrey/omnidreams-same-seed-repro
Open

test(omnidreams): same-seed bitwise-reproducibility GPU test + nightly CI#324
jmccaffrey-nv wants to merge 1 commit into
mainfrom
dev/jmccaffrey/omnidreams-same-seed-repro

Conversation

@jmccaffrey-nv

Copy link
Copy Markdown
Collaborator

What

Adds a GPU test that pins bitwise reproducibility at the same seed for the OmniDreams distilled runner, plus a nightly CI job that runs it.

The test (integrations/omnidreams/tests/test_omnidreams_same_seed_reproducibility.py) runs the distilled runner (omnidreams-sv-2steps-chunk2-loc6-lightvae-lighttae) twice at the same seed under PyTorch's strict-determinism flags and asserts the two output MP4s are byte-identical (sha256):

  • CUBLAS_WORKSPACE_CONFIG=:4096:8
  • torch.use_deterministic_algorithms(True, warn_only=True)
  • PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

(flags set in a fresh subprocess before the first CUDA context, per the PyTorch reproducibility notes)

Unlike a golden-hash regression, it compares the two runs against each other, so there is no committed digest to maintain and it stays green across toolchain/hardware changes.

How it's gated

  • Tier-marked ci_gpu but skips unless OMNIDREAMS_REPRO_RUN is set, so the per-PR pytest -m ci_gpu job collects-and-skips it instantly.
  • New nightly workflow .github/workflows/determinism.yml (cron + workflow_dispatch) sets OMNIDREAMS_REPRO_RUN=1 to actually run it on the GPU runner.

(We avoid the manual marker on purpose: pytest-manual-marker xfails every manual test at setup, so it would never execute as a CI guard.)

Verification

Ran on an H100 (slurm, cu130 container): two seed=1 rollouts produced an identical MP4 — 1 passed in 384.73s.

🤖 Generated with Claude Code

…ghtly CI

Add a GPU test that runs the distilled omnidreams runner twice at the same
seed under PyTorch's strict-determinism flags
(torch.use_deterministic_algorithms(True, warn_only=True),
CUBLAS_WORKSPACE_CONFIG=:4096:8, PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True)
and asserts the two output MP4s are byte-identical (sha256). Unlike a
golden-hash regression, it compares the two runs against each other, so it
needs no committed digest and stays green across toolchain/hardware changes.

The test carries the ci_gpu tier marker but skips unless OMNIDREAMS_REPRO_RUN
is set, so the per-PR `pytest -m ci_gpu` job collects-and-skips it instantly.
A new nightly workflow (.github/workflows/determinism.yml, schedule +
workflow_dispatch) sets OMNIDREAMS_REPRO_RUN=1 to actually run it on the GPU
runner.

Verified passing on an H100 (slurm, cu130 container): two seed=1 rollouts
produced an identical MP4 (1 passed in 384.73s).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds a nightly GPU job that verifies bitwise reproducibility of the OmniDreams distilled runner by running it twice at the same seed and asserting sha256 equality of the two output MP4s — no committed golden hash needed.

  • Test (test_omnidreams_same_seed_reproducibility.py): spawns two isolated subprocesses with CUBLAS_WORKSPACE_CONFIG, PYTORCH_CUDA_ALLOC_CONF, and torch.use_deterministic_algorithms(warn_only=True) set before the first CUDA context; compares the resulting MP4 digests with a detailed failure message.
  • Workflow (.github/workflows/determinism.yml): runs on a cron schedule (07:00 UTC) and workflow_dispatch; the test is gated by OMNIDREAMS_REPRO_RUN=1 so the per-PR ci_gpu suite collects and skips it instantly.

Confidence Score: 4/5

Safe to merge — the test logic and subprocess isolation are sound; the only issues are a stale comment in the workflow and a floating action ref.

The test correctly isolates each rollout in a fresh subprocess, sets all determinism knobs before the first CUDA context, and compares sha256 digests with clear failure diagnostics. The workflow header comment says the test is marked manual when it is actually marked ci_gpu with an env gate — a reader relying solely on that comment would misunderstand the gating scheme. The nv-gha-runners/setup-proxy-cache@main reference is mutable and could silently change what the job installs.

.github/workflows/determinism.yml — stale manual-marker comment and floating @main action ref.

Important Files Changed

Filename Overview
.github/workflows/determinism.yml New nightly CI job running the reproducibility test on an RTX Pro 6000 runner; one comment incorrectly states the test uses the manual marker (it uses ci_gpu + env gate); setup-proxy-cache@main is a floating ref.
integrations/omnidreams/tests/test_omnidreams_same_seed_reproducibility.py Well-structured opt-in GPU test that spawns two isolated subprocesses with determinism env vars set before the first CUDA context, then compares sha256 of their output MP4s; error messages are informative.

Sequence Diagram

sequenceDiagram
    participant CI as Nightly CI (determinism.yml)
    participant Test as test_same_seed_is_bitwise_reproducible
    participant SubA as Subprocess A (python -c bootstrap)
    participant SubB as Subprocess B (python -c bootstrap)
    participant GPU as GPU / CUDA

    CI->>Test: "pytest (OMNIDREAMS_REPRO_RUN=1, seed=1)"
    Test->>SubA: spawn with CUBLAS_WORKSPACE_CONFIG + PYTORCH_CUDA_ALLOC_CONF
    SubA->>SubA: os.environ.setdefault CUBLAS_WORKSPACE_CONFIG
    SubA->>SubA: "torch.use_deterministic_algorithms(True, warn_only=True)"
    SubA->>GPU: "run distilled runner (seed=1, total_blocks=4)"
    GPU-->>SubA: inference output
    SubA-->>Test: "rc=0, run_a/recipe.mp4"
    Test->>SubB: "spawn identical env (seed=1)"
    SubB->>SubB: same determinism bootstrap
    SubB->>GPU: "run distilled runner (seed=1, total_blocks=4)"
    GPU-->>SubB: inference output
    SubB-->>Test: "rc=0, run_b/recipe.mp4"
    Test->>Test: "sha256(mp4_a) == sha256(mp4_b)?"
    alt byte-identical
        Test-->>CI: PASS
    else digest mismatch
        Test-->>CI: FAIL (diff hint + tmp paths retained)
    end
Loading

Reviews (1): Last reviewed commit: "test(omnidreams): add same-seed bitwise-..." | Re-trigger Greptile

Comment on lines +5 to +9
# The same-seed reproducibility test is marked `manual`: it needs a real GPU,
# downloads the distilled checkpoint + an example HDMap clip from HF, and runs
# two full rollouts, so it is too heavy for the per-PR `ci_gpu` job. Instead it
# runs here on a nightly schedule (and on demand via the Actions "Run workflow"
# button). Scheduled runs execute on the default branch only.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The header comment says the test is marked manual, but the test file uses pytestmark = pytest.mark.ci_gpu with an env-gate skip — not manual. The PR description even explains why manual was deliberately avoided (the pytest-manual-marker plugin would xfail it at setup). A reader consulting only this file would have a wrong mental model of why the nightly job is needed and how the per-PR gate works.

Suggested change
# The same-seed reproducibility test is marked `manual`: it needs a real GPU,
# downloads the distilled checkpoint + an example HDMap clip from HF, and runs
# two full rollouts, so it is too heavy for the per-PR `ci_gpu` job. Instead it
# runs here on a nightly schedule (and on demand via the Actions "Run workflow"
# button). Scheduled runs execute on the default branch only.
# The same-seed reproducibility test carries the `ci_gpu` marker but skips
# unless OMNIDREAMS_REPRO_RUN is set, so the per-PR `pytest -m ci_gpu` run
# collects it and skips it in milliseconds. This job sets that env var so the
# test actually executes. (The `manual` marker was deliberately avoided:
# pytest-manual-marker xfails every `manual` test at setup, so a `manual` test
# never executes in automation.) Scheduled runs target the default branch only.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

rm -rf /var/lib/apt/lists/*

- name: Setup proxy cache
uses: nv-gha-runners/setup-proxy-cache@main

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 nv-gha-runners/setup-proxy-cache@main pins to a floating branch head. Any future push to main in that repo changes what runs here without a diff in this file. For internal infra actions this is often intentional, but it's worth confirming whether pinning to a commit SHA or a version tag is feasible — if the action is updated in a breaking or unexpected way, this job will silently change behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant