fix(text-embeddings): compare text_encoder_norm_type case-insensitively by Albatross1382 · Pull Request #478 · Lightricks/ComfyUI-LTXVideo

Albatross1382 · 2026-05-02T00:21:24Z

Problem

text_embeddings_connectors.load_text_embeddings_pipeline at text_embeddings_connectors.py:362 hard-asserts text_encoder_norm_type == 'per_token_rms' (lowercase). LTX-V 22B distilled checkpoints serialise the value as 'PER_TOKEN_RMS' — the enum name, not the lowercase value — so every load fails:

AssertionError: Unexpected config for dual-aggregate model:
  text_encoder_norm_type='PER_TOKEN_RMS', expected 'per_token_rms'

This blocks the entire local-encode path (LTXVGemmaCLIPModelLoader → comfy.sd.CLIP.__init__ → gemma_encoder.GemmaCLIP.__init__ → load_text_embeddings_pipeline). The literal "per_token_rms" appears in exactly one place in this repository (the assertion expectation at line 358) — not a dispatch key, not in any enum — so this is just a comparison-shape mismatch between the assertion and the checkpoint metadata's serialisation.

Fix

Compare string-typed expectations case-insensitively via casefold(), type-guarded so the bool-typed siblings (caption_projection_first_linear: False, caption_proj_input_norm: False, caption_projection_second_linear: False, caption_proj_before_connector: True) retain their strict-equality semantics.

if isinstance(expected_val, str) and isinstance(actual, str):
    ok = actual.casefold() == expected_val.casefold()
else:
    ok = actual == expected_val
assert ok, (...)

Error message preserves actual!r and expected_val!r so the original failure mode remains debuggable if a non-string config key ever drifts.

Reviewer aid — type guard correctness

The _expected dict has 5 keys: 4 booleans + 1 string. The case-fold path must apply to the string only.

isinstance(False, str) → False. Booleans never enter the case-fold branch.
isinstance("per_token_rms", str) → True. The string enters the case-fold branch.
isinstance(actual, str) guards against checkpoints where the same key drifts to a non-string type.

Both legs preserve actual == expected_val when types don't match the case-fold precondition; the assertion message is unchanged.

Repro

Load any LTX-V 22B distilled-1.1 checkpoint via LTXVGemmaCLIPModelLoader:

LTXVGemmaCLIPModelLoader
  → comfy.sd.CLIP.__init__
  → gemma_encoder.GemmaCLIP.__init__ (gemma_encoder.py:182)
  → text_embeddings_connectors.load_text_embeddings_pipeline (text_embeddings_connectors.py:362)
  → AssertionError

Failing checkpoint observed during testing: ltx-2.3-22b-distilled-1.1.safetensors, sha256 b33b7fe4bbfe084f484be4aaf90b0f1d95dca20d403ac4c0e037eb8c4f0af7cc. The corresponding transformer_config["text_encoder_norm_type"] parsed from the safetensors metadata at text_embeddings_connectors.py:335 is 'PER_TOKEN_RMS'.

Why no test

This repository ships no test suite or CI workflow that runs Python tests, so there's nothing to update. Verified manually by smoke-running a minimal text-to-video workflow that wires LTXVGemmaCLIPModelLoader → CLIPTextEncode → KSampler against the patched assertion with the failing checkpoint above; without the patch the assertion fires after Loaded processor from .../text_encoders - enhancement enabled, with the patch the encoder model loads through to CLIP/text encoder model load device: cuda:0 and continues into sampling.

Format

Pre-commit isort, ruff, end-of-file-fixer, trailing-whitespace, check-added-large-files: pass.
black 24.4.2 --target-version py310 --check reports the patched file unchanged.
Single-file change: text_embeddings_connectors.py, +9/−1.

🤖 Generated with Claude Code

LTX-V 22B distilled checkpoints serialise text_encoder_norm_type as the enum name (PER_TOKEN_RMS) rather than the lowercase token (per_token_rms) the assertion expects. The assertion at text_embeddings_connectors.py:362 strict-equality fails on every load, blocking the local-encode path entirely. Fix: compare string-typed expectations case-insensitively via casefold(). Type-guarded so the bool-typed siblings (caption_projection_first_linear etc.) keep their strict-equality semantics. Repro: load any LTX-V 22B distilled-1.1 checkpoint via LTXVGemmaCLIPModelLoader. Stack: LTXVGemmaCLIPModelLoader → comfy.sd.CLIP.__init__ → gemma_encoder.GemmaCLIP.__init__ (gemma_encoder.py:182) → text_embeddings_connectors.load_text_embeddings_pipeline (text_embeddings_connectors.py:362) → AssertionError: Unexpected config for dual-aggregate model: text_encoder_norm_type='PER_TOKEN_RMS', expected 'per_token_rms' Failing checkpoint: ltx-2.3-22b-distilled-1.1.safetensors sha256:b33b7fe4bbfe084f484be4aaf90b0f1d95dca20d403ac4c0e037eb8c4f0af7cc. No existing tests in repo to update; verified manually by smoke-running a minimal T2V workflow that wires LTXVGemmaCLIPModelLoader → CLIPTextEncode → KSampler against the patched assertion with the failing checkpoint. Pre-commit (isort, ruff, end-of-file-fixer, trailing-whitespace) passes; black 24.4.2 with --target-version py310 reports the patched file unchanged. Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(text-embeddings): compare text_encoder_norm_type case-insensitively#478

fix(text-embeddings): compare text_encoder_norm_type case-insensitively#478
Albatross1382 wants to merge 1 commit intoLightricks:masterfrom
Albatross1382:fix/text-encoder-norm-type-case-insensitive

Albatross1382 commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Albatross1382 commented May 2, 2026

Problem

Fix

Reviewer aid — type guard correctness

Repro

Why no test

Format

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant