Describe the bug
Gemma-4-26b-a4b-it-int4-ov served on OVMS returns gibberish response after certain output length.
To Reproduce
Steps to reproduce the behavior:
- Download ovms_windows_2026.2.0_python_off.zip.
- Download models from [https://huggingface.co/OpenVINO/gemma-4-26b-a4b-it-int4-ov]
- CD into downloaded ovms folder, run
setupvars.bat
- OVMS launch command: .\ovms --rest_port 8180 --config_path C:\models\gemma-4-26b-a4b-it-int4-ov\model_config.json
- Client command: curl http://localhost:8180/v3/chat/completions -H "Content-Type: application/json" -d "{"model": "gemma-4-26b-a4b-it-int4-ov", "messages": [{"role": "user", "content": "What is OpenVINO? Please explain with examples."}], "max_tokens": 500, "temperature": 0}"
Expected behavior
An explanation with proper English and grammar.
Logs
Configuration
- OVMS version:
2026.2.0
- OVMS config.json file
{
"mediapipe_config_list": [
{
"name": "gemma-4-26b-a4b-it-int4-ov",
"base_path": "C:\\models\\gemma-4-26b-a4b-it-int4-ov"
}
],
"model_config_list": []
}
- CPU, accelerator's versions if applicable:
Intel(R) Core(TM) Ultra 7 268V (8 CPUs)
Intel(R) Arc(TM) 140V GPU (16GB)
Driver version 32.0.101.8826
- Model repository directory structure:
C:\models\gemma-4-26b-a4b-it-int4-ov\
├── .cache\
├── .gitattributes
├── chat_template.jinja
├── config.json
├── generation_config.json
├── graph.pbtxt
├── model_config.json
├── openvino_config.json
├── openvino_detokenizer.bin
├── openvino_detokenizer.xml
├── openvino_language_model.bin
├── openvino_language_model.xml
├── openvino_text_embeddings_model.bin
├── openvino_text_embeddings_model.xml
├── openvino_text_embeddings_per_layer_model.bin
├── openvino_text_embeddings_per_layer_model.xml
├── openvino_tokenizer.bin
├── openvino_tokenizer.xml
├── openvino_vision_embeddings_model.bin
├── openvino_vision_embeddings_model.xml
├── preprocessor_config.json
├── processor_config.json
├── README.md
├── tokenizer.json
└── tokenizer_config.json
- Model or publicly available similar model that reproduces the issue:
Gemma-4-26b-a4b-it-int4-ov
Additional context
input_stream: "HTTP_REQUEST_PAYLOAD:input"
output_stream: "HTTP_RESPONSE_PAYLOAD:output"
node: {
name: "LLMExecutor"
calculator: "HttpLLMCalculator"
input_stream: "LOOPBACK:loopback"
input_stream: "HTTP_REQUEST_PAYLOAD:input"
input_side_packet: "LLM_NODE_RESOURCES:llm"
output_stream: "LOOPBACK:loopback"
output_stream: "HTTP_RESPONSE_PAYLOAD:output"
input_stream_info: {
tag_index: 'LOOPBACK:0'
back_edge: true
}
node_options: {
[type.googleapis.com/mediapipe.LLMCalculatorOptions]: {
models_path: "./"
plugin_config: '{}'
enable_prefix_caching: false
dynamic_split_fuse: true
max_num_seqs: 256
max_num_batched_tokens: 256
device: "GPU"
}
}
input_stream_handler {
input_stream_handler: "SyncSetInputStreamHandler"
options {
[mediapipe.SyncSetInputStreamHandlerOptions.ext] {
sync_set {
tag_index: "LOOPBACK:0"
}
}
}
}
}
Describe the bug
Gemma-4-26b-a4b-it-int4-ov served on OVMS returns gibberish response after certain output length.
To Reproduce
Steps to reproduce the behavior:
setupvars.batExpected behavior
An explanation with proper English and grammar.
Logs
Configuration
2026.2.0Gemma-4-26b-a4b-it-int4-ovAdditional context