Skip to content

fix(text-embeddings): compare text_encoder_norm_type case-insensitively#478

Open
Albatross1382 wants to merge 1 commit intoLightricks:masterfrom
Albatross1382:fix/text-encoder-norm-type-case-insensitive
Open

fix(text-embeddings): compare text_encoder_norm_type case-insensitively#478
Albatross1382 wants to merge 1 commit intoLightricks:masterfrom
Albatross1382:fix/text-encoder-norm-type-case-insensitive

Conversation

@Albatross1382
Copy link
Copy Markdown

Problem

text_embeddings_connectors.load_text_embeddings_pipeline at text_embeddings_connectors.py:362 hard-asserts text_encoder_norm_type == 'per_token_rms' (lowercase). LTX-V 22B distilled checkpoints serialise the value as 'PER_TOKEN_RMS' — the enum name, not the lowercase value — so every load fails:

AssertionError: Unexpected config for dual-aggregate model:
  text_encoder_norm_type='PER_TOKEN_RMS', expected 'per_token_rms'

This blocks the entire local-encode path (LTXVGemmaCLIPModelLoadercomfy.sd.CLIP.__init__gemma_encoder.GemmaCLIP.__init__load_text_embeddings_pipeline). The literal "per_token_rms" appears in exactly one place in this repository (the assertion expectation at line 358) — not a dispatch key, not in any enum — so this is just a comparison-shape mismatch between the assertion and the checkpoint metadata's serialisation.

Fix

Compare string-typed expectations case-insensitively via casefold(), type-guarded so the bool-typed siblings (caption_projection_first_linear: False, caption_proj_input_norm: False, caption_projection_second_linear: False, caption_proj_before_connector: True) retain their strict-equality semantics.

if isinstance(expected_val, str) and isinstance(actual, str):
    ok = actual.casefold() == expected_val.casefold()
else:
    ok = actual == expected_val
assert ok, (...)

Error message preserves actual!r and expected_val!r so the original failure mode remains debuggable if a non-string config key ever drifts.

Reviewer aid — type guard correctness

The _expected dict has 5 keys: 4 booleans + 1 string. The case-fold path must apply to the string only.

  • isinstance(False, str)False. Booleans never enter the case-fold branch.
  • isinstance("per_token_rms", str)True. The string enters the case-fold branch.
  • isinstance(actual, str) guards against checkpoints where the same key drifts to a non-string type.

Both legs preserve actual == expected_val when types don't match the case-fold precondition; the assertion message is unchanged.

Repro

Load any LTX-V 22B distilled-1.1 checkpoint via LTXVGemmaCLIPModelLoader:

LTXVGemmaCLIPModelLoader
  → comfy.sd.CLIP.__init__
  → gemma_encoder.GemmaCLIP.__init__ (gemma_encoder.py:182)
  → text_embeddings_connectors.load_text_embeddings_pipeline (text_embeddings_connectors.py:362)
  → AssertionError

Failing checkpoint observed during testing: ltx-2.3-22b-distilled-1.1.safetensors, sha256 b33b7fe4bbfe084f484be4aaf90b0f1d95dca20d403ac4c0e037eb8c4f0af7cc. The corresponding transformer_config["text_encoder_norm_type"] parsed from the safetensors metadata at text_embeddings_connectors.py:335 is 'PER_TOKEN_RMS'.

Why no test

This repository ships no test suite or CI workflow that runs Python tests, so there's nothing to update. Verified manually by smoke-running a minimal text-to-video workflow that wires LTXVGemmaCLIPModelLoaderCLIPTextEncodeKSampler against the patched assertion with the failing checkpoint above; without the patch the assertion fires after Loaded processor from .../text_encoders - enhancement enabled, with the patch the encoder model loads through to CLIP/text encoder model load device: cuda:0 and continues into sampling.

Format

  • Pre-commit isort, ruff, end-of-file-fixer, trailing-whitespace, check-added-large-files: pass.
  • black 24.4.2 --target-version py310 --check reports the patched file unchanged.
  • Single-file change: text_embeddings_connectors.py, +9/−1.

🤖 Generated with Claude Code

LTX-V 22B distilled checkpoints serialise text_encoder_norm_type as
the enum name (PER_TOKEN_RMS) rather than the lowercase token
(per_token_rms) the assertion expects. The assertion at
text_embeddings_connectors.py:362 strict-equality fails on every
load, blocking the local-encode path entirely.

Fix: compare string-typed expectations case-insensitively via
casefold(). Type-guarded so the bool-typed siblings
(caption_projection_first_linear etc.) keep their strict-equality
semantics.

Repro: load any LTX-V 22B distilled-1.1 checkpoint via
LTXVGemmaCLIPModelLoader. Stack:
  LTXVGemmaCLIPModelLoader
    → comfy.sd.CLIP.__init__
    → gemma_encoder.GemmaCLIP.__init__ (gemma_encoder.py:182)
    → text_embeddings_connectors.load_text_embeddings_pipeline (text_embeddings_connectors.py:362)
    → AssertionError: Unexpected config for dual-aggregate model:
      text_encoder_norm_type='PER_TOKEN_RMS', expected 'per_token_rms'

Failing checkpoint: ltx-2.3-22b-distilled-1.1.safetensors
sha256:b33b7fe4bbfe084f484be4aaf90b0f1d95dca20d403ac4c0e037eb8c4f0af7cc.

No existing tests in repo to update; verified manually by smoke-running
a minimal T2V workflow that wires LTXVGemmaCLIPModelLoader →
CLIPTextEncode → KSampler against the patched assertion with the
failing checkpoint. Pre-commit (isort, ruff, end-of-file-fixer,
trailing-whitespace) passes; black 24.4.2 with --target-version py310
reports the patched file unchanged.

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant