/chat/completions fails ("Chat template not loaded correctly") for VL-derived / omni text decoders — Python-Jinja path leaves template null; MINJA mode would fix it but is not in the 2026.2.x release

## Summary
On a `_python_on` Windows build of OVMS, `/v3/chat/completions` fails for a text-decoder IR extracted from a tri-modal (text+image+audio+video) **qwen3_5** model:

```
Mediapipe execution failed. MP status - INVALID_ARGUMENT: CalculatorGraph::Run() failed:
Calculator::Process() for node "LLMExecutor" failed:
Error: Chat template not loaded correctly, so it cannot be applied
```

`/v3/completions` (raw prompt) on the **same** served model works perfectly, so the model, tokenizer, and inference are fine — only the **chat-template application path** fails. The model is loaded as an **LLM (continuous-batching) pipeline**, which defaults to the **Python-Jinja2** template processor.

## Environment
- OVMS `2026.2.1` (`ovms_windows_2026.2.1_python_on`), GenAI backend `2026.2.1.0-3123`. Also reproduced on `2026.2.0`.
- Windows, Intel Arc GPU (`targetDevice: GPU`).
- Model: text decoder extracted from a `qwen3_5` omni model, exported to INT4 OpenVINO IR via `optimum-cli`. Standard Qwen ChatML template; `<|im_start|>`/`<|im_end|>` present in vocab; `bos=None, eos=<|im_end|>` (identical to a working Qwen3-14B).

## Reproduction
1. Serve the omni-derived text-decoder IR as an LLM continuous-batching pipeline (default `graph.pbtxt`).
2. `POST /v3/completions` with a raw ChatML prompt → **works**, correct output.
3. `POST /v3/chat/completions` with `messages` → **fails** with the error above.

## Root cause (traced in source)
`/chat` uses the embedded **Python Jinja2** processor, and the template object ends up **null**:

- `src/llm/py_jinja_template_processor.cpp` (~L39-40):
  ```cpp
  if (templateProcessor.chatTemplate == nullptr) {
      output = "Error: Chat template not loaded correctly, so it cannot be applied";
      return false;
  }
  ```
- `src/llm/servable_initializer.cpp` → `loadPyTemplateProcessor` (~L147+): reads
  `tokenizer.get_original_chat_template()` then compiles it in an
  `ImmutableSandboxedEnvironment`. For this tokenizer the load/compile does not
  produce a usable template, so `chatTemplate` stays null and `/chat` fails at apply time.

Importantly, **GenAI's own `Tokenizer.apply_chat_template()` succeeds on the exact same tokenizer** (verified standalone with `openvino_genai 2026.2.1.0`, which renders correct ChatML). So the failure is specific to OVMS's **Python-Jinja** serving path, not GenAI's template engine. Upgrading GenAI alone does not fix `/chat`.

## The mechanism to fix it already exists in `main` — but not in the release
`main` has `LLMCalculatorOptions.chat_template_mode` (`src/llm/llm_calculator.proto`):
- `MINJA = 0` — use GenAI `apply_chat_template` (the path that **works** here). *"default for VLM pipelines."*
- `JINJA = 1` — Python Jinja2. *"default for LLM pipelines"* — i.e. the failing path for this model.

There is even an in-code `TODO(dkalinow)` to make MINJA the default for VLM. Setting `chat_template_mode: MINJA` in the graph would route through the working engine — **but the `2026.2.1` release binary rejects the field**:

```
libprotobuf ERROR ... text_format.cc: Message type "mediapipe.LLMCalculatorOptions"
has no field named "chat_template_mode".
```

So the option is `main`-only and the graph fails to load when it's added on `2026.2.1`.

## Requests
1. **Release the `chat_template_mode` option** in a `2026.2.x`/`2026.3` build so users can opt VL-derived/omni LLM pipelines into `MINJA`.
2. **(Robustness)** In `loadChatTemplate`/`loadPyTemplateProcessor`, when the Python-Jinja processor leaves `chatTemplate == nullptr`, **auto-fall-back to `MINJA`** (GenAI's engine) instead of failing `/chat` outright — GenAI already handles these templates correctly.
3. Consider making `MINJA` the default (or auto-selected) for LLM pipelines whose tokenizer originates from a VL/omni model (aligns with the existing VLM TODO).

## Current workaround
A thin reverse proxy that applies ChatML itself and forwards to `/v3/completions` restores `/chat/completions` fully (verified, correct outputs). Happy to share if useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

/chat/completions fails ("Chat template not loaded correctly") for VL-derived / omni text decoders — Python-Jinja path leaves template null; MINJA mode would fix it but is not in the 2026.2.x release #4322

Summary

Environment

Reproduction

Root cause (traced in source)

The mechanism to fix it already exists in `main` — but not in the release

Requests

Current workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

/chat/completions fails ("Chat template not loaded correctly") for VL-derived / omni text decoders — Python-Jinja path leaves template null; MINJA mode would fix it but is not in the 2026.2.x release #4322

Description

Summary

Environment

Reproduction

Root cause (traced in source)

The mechanism to fix it already exists in main — but not in the release

Requests

Current workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

The mechanism to fix it already exists in `main` — but not in the release