Skip to content

Gemma-4-26b-a4b-it-int4-ov gibberish output #4298

Description

@wjlingz

Describe the bug
Gemma-4-26b-a4b-it-int4-ov served on OVMS returns gibberish response after certain output length.

To Reproduce
Steps to reproduce the behavior:

  1. Download ovms_windows_2026.2.0_python_off.zip.
  2. Download models from [https://huggingface.co/OpenVINO/gemma-4-26b-a4b-it-int4-ov]
  3. CD into downloaded ovms folder, run setupvars.bat
  4. OVMS launch command: .\ovms --rest_port 8180 --config_path C:\models\gemma-4-26b-a4b-it-int4-ov\model_config.json
  5. Client command: curl http://localhost:8180/v3/chat/completions -H "Content-Type: application/json" -d "{"model": "gemma-4-26b-a4b-it-int4-ov", "messages": [{"role": "user", "content": "What is OpenVINO? Please explain with examples."}], "max_tokens": 500, "temperature": 0}"

Expected behavior
An explanation with proper English and grammar.

Logs

Image

Configuration

  1. OVMS version: 2026.2.0
  2. OVMS config.json file
{
  "mediapipe_config_list": [
    {
      "name": "gemma-4-26b-a4b-it-int4-ov",
      "base_path": "C:\\models\\gemma-4-26b-a4b-it-int4-ov"
    }
  ],
  "model_config_list": []
}
  1. CPU, accelerator's versions if applicable:
Intel(R) Core(TM) Ultra 7 268V (8 CPUs)
Intel(R) Arc(TM) 140V GPU (16GB)
Driver version 32.0.101.8826
  1. Model repository directory structure:
C:\models\gemma-4-26b-a4b-it-int4-ov\
├── .cache\
├── .gitattributes
├── chat_template.jinja
├── config.json
├── generation_config.json
├── graph.pbtxt
├── model_config.json
├── openvino_config.json
├── openvino_detokenizer.bin
├── openvino_detokenizer.xml
├── openvino_language_model.bin
├── openvino_language_model.xml
├── openvino_text_embeddings_model.bin
├── openvino_text_embeddings_model.xml
├── openvino_text_embeddings_per_layer_model.bin
├── openvino_text_embeddings_per_layer_model.xml
├── openvino_tokenizer.bin
├── openvino_tokenizer.xml
├── openvino_vision_embeddings_model.bin
├── openvino_vision_embeddings_model.xml
├── preprocessor_config.json
├── processor_config.json
├── README.md
├── tokenizer.json
└── tokenizer_config.json
  1. Model or publicly available similar model that reproduces the issue: Gemma-4-26b-a4b-it-int4-ov

Additional context

input_stream: "HTTP_REQUEST_PAYLOAD:input"
output_stream: "HTTP_RESPONSE_PAYLOAD:output"

node: {
  name: "LLMExecutor"
  calculator: "HttpLLMCalculator"
  input_stream: "LOOPBACK:loopback"
  input_stream: "HTTP_REQUEST_PAYLOAD:input"
  input_side_packet: "LLM_NODE_RESOURCES:llm"
  output_stream: "LOOPBACK:loopback"
  output_stream: "HTTP_RESPONSE_PAYLOAD:output"
  input_stream_info: {
    tag_index: 'LOOPBACK:0'
    back_edge: true
  }
  node_options: {
    [type.googleapis.com/mediapipe.LLMCalculatorOptions]: {
      models_path: "./"
      plugin_config: '{}'
      enable_prefix_caching: false
      dynamic_split_fuse: true
      max_num_seqs: 256
      max_num_batched_tokens: 256
      device: "GPU"
    }
  }
  input_stream_handler {
    input_stream_handler: "SyncSetInputStreamHandler"
    options {
      [mediapipe.SyncSetInputStreamHandlerOptions.ext] {
        sync_set {
          tag_index: "LOOPBACK:0"
        }
      }
    }
  }
}

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions