Skip to content

fix(wan): correct Wan 2.2 VAE patchify channel order (removes 2px checkerboard)#338

Open
wenqingw-nv wants to merge 1 commit into
mainfrom
dev/wenqingw-nv/wan-vae-patch-order
Open

fix(wan): correct Wan 2.2 VAE patchify channel order (removes 2px checkerboard)#338
wenqingw-nv wants to merge 1 commit into
mainfrom
dev/wenqingw-nv/wan-vae-patch-order

Conversation

@wenqingw-nv

Copy link
Copy Markdown
Collaborator

Problem

The flashdreams Wan 2.2 VAE produced a fixed ~2px checkerboard/stipple on every decoded frame (visible on flat regions — sky, road, walls). diffusers AutoencoderKLWan decodes the identical checkpoint + latent cleanly.

Root cause

_patchify/_unpatchify (recipes/wan/autoencoder/vae.py) folded the spatial patch into channels as (c ph pw), but the checkpoint convs are trained for the diffusers convention (c pw ph) (its patchify permute is (0,1,6,4,2,3,5)). The swapped patch axes transpose every patch_size-square sub-pixel block relative to the trained weights → checkerboard.

Fix

Swap phpw in both rearrange patterns (one axis-order change per function).

Verification

  • VAE roundtrip (encode→decode, no diffusion): flashdreams was checkerboarded, diffusers clean → with the fix, flashdreams matches diffusers (clean).
  • Cross-test (diffusers-encode → flashdreams-decode) isolated it to our decode.
  • Full HY-WorldPlay generation (8-chunk): checkerboard gone.

Scope

Affects every model using the Wan 2.2 VAE with patch_size=2. Downstream: HY-WorldPlay sample/parity videos (PR #318, #336) should be regenerated.

_patchify/_unpatchify folded the spatial patch into channels as
(c ph pw), but the Wan 2.2 VAE checkpoint convs are trained for the
diffusers AutoencoderKLWan convention (c pw ph) (its patchify permute is
(0,1,6,4,2,3,5)). The swapped patch axes transpose every patch_size-square
sub-pixel block relative to the trained weights, decoding to a fixed ~2px
checkerboard on every frame. Swap ph<->pw in both functions to match.

Verified: VAE encode->decode roundtrip and a full HY-WorldPlay generation
are now checkerboard-free and match diffusers' decode of the identical
latent. Affects every model using the Wan 2.2 VAE with patch_size=2.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@copy-pr-bot

copy-pr-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps

greptile-apps Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Fixes the _patchify / _unpatchify channel ordering in the Wan 2.2 VAE so that the patch axes are folded into the channel dimension as (c pw ph) instead of (c ph pw), matching the diffusers AutoencoderKLWan checkpoint convention and eliminating the ~2px checkerboard artifact on every decoded frame.

  • _patchify: rearrange output group changed from (c ph pw)(c pw ph), aligning with the diffusers permute (0,1,6,4,2,3,5) ordering (pw outer, ph inner in the channel block).
  • _unpatchify: matching inverse updated from (c ph pw)(c pw ph) so the channel-to-spatial mapping correctly restores height (h ph) and width (w pw) sub-pixels.
  • Explanatory comments are added to both functions citing the diffusers permute as the authoritative reference.

Confidence Score: 5/5

Safe to merge — the change is a two-character axis swap in a pair of inverse functions, directly fixing a visible decode artifact, with the correct ordering cross-verified against the diffusers reference implementation.

The fix is minimal, mathematically sound, and the patchify/unpatchify pair remains a proper inverse: ph sub-pixels consistently map to the height spatial axis and pw to width in both directions. The added comments cite the authoritative diffusers permute (0,1,6,4,2,3,5) as evidence. No unrelated code is touched and the callers (encode, decode) are unchanged.

No files require special attention; the single changed file has a straightforward, well-scoped correction.

Important Files Changed

Filename Overview
flashdreams/flashdreams/recipes/wan/autoencoder/vae.py Two-character axis-label swap in _patchify and _unpatchify; mathematically correct inversion pair; good explanatory comments added.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Input: b c t H W"] --> B["_patchify(x, patch_size)"]
    B --> C{"patch_size == 1?"}
    C -- Yes --> D["Identity (return x)"]
    C -- No --> E["rearrange:\nb c t (h ph) (w pw)\n→ b (c pw ph) t h w\n[pw outer, ph inner in channel block]"]
    E --> F["Encoder3d / conv1 / latent z"]
    F --> G["WanVAE.decode(z)"]
    G --> H["Decoder3d output"]
    H --> I["_unpatchify(out, patch_size)"]
    I --> J{"patch_size == 1?"}
    J -- Yes --> K["Identity (return out)"]
    J -- No --> L["rearrange:\nb (c pw ph) t h w\n→ b c t (h ph) (w pw)\n[restore height=h×ph, width=w×pw]"]
    L --> M["Output: b c t H W (clean frame)"]

    style E fill:#d4edda,stroke:#28a745
    style L fill:#d4edda,stroke:#28a745
Loading

Reviews (1): Last reviewed commit: "fix(wan): correct Wan 2.2 VAE patchify c..." | Re-trigger Greptile

@wenqingw-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test 502a7ce

wenqingw-nv pushed a commit that referenced this pull request Jun 15, 2026
Re-measured native-vs-vendor mean |Δ| with the VAE patchify channel-order
fix on the native leg (native regenerated, vendor reused unchanged). The
~2px decode checkerboard removal drops median parity 36.0 -> 24.0 / 255
(per-image 36->28, 28->23, 36->26, 21->17, 47->24). Perf/speedup table is
unchanged — the fix is a zero-cost einops axis swap.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
wenqingw-nv pushed a commit that referenced this pull request Jun 15, 2026
Re-measured native-vs-vendor mean |Δ| with the VAE patchify channel-order
fix on the native leg (native regenerated, vendor reused unchanged). The
~2px decode checkerboard removal drops median parity 36.0 -> 24.0 / 255
(per-image 36->28, 28->23, 36->26, 21->17, 47->24). Perf/speedup table is
unchanged — the fix is a zero-cost einops axis swap.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants