Skip to content

/chat/completions fails ("Chat template not loaded correctly") for VL-derived / omni text decoders — Python-Jinja path leaves template null; MINJA mode would fix it but is not in the 2026.2.x release #4322

Description

@exzile

Summary

On a _python_on Windows build of OVMS, /v3/chat/completions fails for a text-decoder IR extracted from a tri-modal (text+image+audio+video) qwen3_5 model:

Mediapipe execution failed. MP status - INVALID_ARGUMENT: CalculatorGraph::Run() failed:
Calculator::Process() for node "LLMExecutor" failed:
Error: Chat template not loaded correctly, so it cannot be applied

/v3/completions (raw prompt) on the same served model works perfectly, so the model, tokenizer, and inference are fine — only the chat-template application path fails. The model is loaded as an LLM (continuous-batching) pipeline, which defaults to the Python-Jinja2 template processor.

Environment

  • OVMS 2026.2.1 (ovms_windows_2026.2.1_python_on), GenAI backend 2026.2.1.0-3123. Also reproduced on 2026.2.0.
  • Windows, Intel Arc GPU (targetDevice: GPU).
  • Model: text decoder extracted from a qwen3_5 omni model, exported to INT4 OpenVINO IR via optimum-cli. Standard Qwen ChatML template; <|im_start|>/<|im_end|> present in vocab; bos=None, eos=<|im_end|> (identical to a working Qwen3-14B).

Reproduction

  1. Serve the omni-derived text-decoder IR as an LLM continuous-batching pipeline (default graph.pbtxt).
  2. POST /v3/completions with a raw ChatML prompt → works, correct output.
  3. POST /v3/chat/completions with messagesfails with the error above.

Root cause (traced in source)

/chat uses the embedded Python Jinja2 processor, and the template object ends up null:

  • src/llm/py_jinja_template_processor.cpp (~L39-40):
    if (templateProcessor.chatTemplate == nullptr) {
        output = "Error: Chat template not loaded correctly, so it cannot be applied";
        return false;
    }
  • src/llm/servable_initializer.cpploadPyTemplateProcessor (~L147+): reads
    tokenizer.get_original_chat_template() then compiles it in an
    ImmutableSandboxedEnvironment. For this tokenizer the load/compile does not
    produce a usable template, so chatTemplate stays null and /chat fails at apply time.

Importantly, GenAI's own Tokenizer.apply_chat_template() succeeds on the exact same tokenizer (verified standalone with openvino_genai 2026.2.1.0, which renders correct ChatML). So the failure is specific to OVMS's Python-Jinja serving path, not GenAI's template engine. Upgrading GenAI alone does not fix /chat.

The mechanism to fix it already exists in main — but not in the release

main has LLMCalculatorOptions.chat_template_mode (src/llm/llm_calculator.proto):

  • MINJA = 0 — use GenAI apply_chat_template (the path that works here). "default for VLM pipelines."
  • JINJA = 1 — Python Jinja2. "default for LLM pipelines" — i.e. the failing path for this model.

There is even an in-code TODO(dkalinow) to make MINJA the default for VLM. Setting chat_template_mode: MINJA in the graph would route through the working engine — but the 2026.2.1 release binary rejects the field:

libprotobuf ERROR ... text_format.cc: Message type "mediapipe.LLMCalculatorOptions"
has no field named "chat_template_mode".

So the option is main-only and the graph fails to load when it's added on 2026.2.1.

Requests

  1. Release the chat_template_mode option in a 2026.2.x/2026.3 build so users can opt VL-derived/omni LLM pipelines into MINJA.
  2. (Robustness) In loadChatTemplate/loadPyTemplateProcessor, when the Python-Jinja processor leaves chatTemplate == nullptr, auto-fall-back to MINJA (GenAI's engine) instead of failing /chat outright — GenAI already handles these templates correctly.
  3. Consider making MINJA the default (or auto-selected) for LLM pipelines whose tokenizer originates from a VL/omni model (aligns with the existing VLM TODO).

Current workaround

A thin reverse proxy that applies ChatML itself and forwards to /v3/completions restores /chat/completions fully (verified, correct outputs). Happy to share if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions