fix(text-embeddings): compare text_encoder_norm_type case-insensitively#478
Open
Albatross1382 wants to merge 1 commit intoLightricks:masterfrom
Open
Conversation
LTX-V 22B distilled checkpoints serialise text_encoder_norm_type as
the enum name (PER_TOKEN_RMS) rather than the lowercase token
(per_token_rms) the assertion expects. The assertion at
text_embeddings_connectors.py:362 strict-equality fails on every
load, blocking the local-encode path entirely.
Fix: compare string-typed expectations case-insensitively via
casefold(). Type-guarded so the bool-typed siblings
(caption_projection_first_linear etc.) keep their strict-equality
semantics.
Repro: load any LTX-V 22B distilled-1.1 checkpoint via
LTXVGemmaCLIPModelLoader. Stack:
LTXVGemmaCLIPModelLoader
→ comfy.sd.CLIP.__init__
→ gemma_encoder.GemmaCLIP.__init__ (gemma_encoder.py:182)
→ text_embeddings_connectors.load_text_embeddings_pipeline (text_embeddings_connectors.py:362)
→ AssertionError: Unexpected config for dual-aggregate model:
text_encoder_norm_type='PER_TOKEN_RMS', expected 'per_token_rms'
Failing checkpoint: ltx-2.3-22b-distilled-1.1.safetensors
sha256:b33b7fe4bbfe084f484be4aaf90b0f1d95dca20d403ac4c0e037eb8c4f0af7cc.
No existing tests in repo to update; verified manually by smoke-running
a minimal T2V workflow that wires LTXVGemmaCLIPModelLoader →
CLIPTextEncode → KSampler against the patched assertion with the
failing checkpoint. Pre-commit (isort, ruff, end-of-file-fixer,
trailing-whitespace) passes; black 24.4.2 with --target-version py310
reports the patched file unchanged.
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
text_embeddings_connectors.load_text_embeddings_pipelineattext_embeddings_connectors.py:362hard-assertstext_encoder_norm_type == 'per_token_rms'(lowercase). LTX-V 22B distilled checkpoints serialise the value as'PER_TOKEN_RMS'— the enum name, not the lowercase value — so every load fails:This blocks the entire local-encode path (
LTXVGemmaCLIPModelLoader→comfy.sd.CLIP.__init__→gemma_encoder.GemmaCLIP.__init__→load_text_embeddings_pipeline). The literal"per_token_rms"appears in exactly one place in this repository (the assertion expectation at line 358) — not a dispatch key, not in any enum — so this is just a comparison-shape mismatch between the assertion and the checkpoint metadata's serialisation.Fix
Compare string-typed expectations case-insensitively via
casefold(), type-guarded so the bool-typed siblings (caption_projection_first_linear: False,caption_proj_input_norm: False,caption_projection_second_linear: False,caption_proj_before_connector: True) retain their strict-equality semantics.Error message preserves
actual!randexpected_val!rso the original failure mode remains debuggable if a non-string config key ever drifts.Reviewer aid — type guard correctness
The
_expecteddict has 5 keys: 4 booleans + 1 string. The case-fold path must apply to the string only.isinstance(False, str)→False. Booleans never enter the case-fold branch.isinstance("per_token_rms", str)→True. The string enters the case-fold branch.isinstance(actual, str)guards against checkpoints where the same key drifts to a non-string type.Both legs preserve
actual == expected_valwhen types don't match the case-fold precondition; the assertion message is unchanged.Repro
Load any LTX-V 22B distilled-1.1 checkpoint via
LTXVGemmaCLIPModelLoader:Failing checkpoint observed during testing:
ltx-2.3-22b-distilled-1.1.safetensors, sha256b33b7fe4bbfe084f484be4aaf90b0f1d95dca20d403ac4c0e037eb8c4f0af7cc. The correspondingtransformer_config["text_encoder_norm_type"]parsed from the safetensors metadata attext_embeddings_connectors.py:335is'PER_TOKEN_RMS'.Why no test
This repository ships no test suite or CI workflow that runs Python tests, so there's nothing to update. Verified manually by smoke-running a minimal text-to-video workflow that wires
LTXVGemmaCLIPModelLoader→CLIPTextEncode→KSampleragainst the patched assertion with the failing checkpoint above; without the patch the assertion fires afterLoaded processor from .../text_encoders - enhancement enabled, with the patch the encoder model loads through toCLIP/text encoder model load device: cuda:0and continues into sampling.Format
isort,ruff,end-of-file-fixer,trailing-whitespace,check-added-large-files: pass.black 24.4.2 --target-version py310 --checkreports the patched file unchanged.text_embeddings_connectors.py, +9/−1.🤖 Generated with Claude Code