change default cache_interal_miltiplier [test_doc_files_windows=demos… by dtrawins · Pull Request #4314 · openvinotoolkit/model_server

dtrawins · 2026-06-22T21:56:21Z

…/continuous_batching/agentic_ai/README.md]

🛠 Summary

CVS-188061 - new default value for cache_interval_multiplier=64 is more tunned for long contexts which are expected to be typical for models with linear attention

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

…/continuous_batching/agentic_ai/README.md]

Copilot

Pull request overview

This PR aims to change the default behavior and documentation for cache_interval_multiplier (intended to default to 64) to better support long-context workloads typical for linear-attention models.

Changes:

Updates LLMCalculatorOptions.cache_interval_multiplier in the proto to specify a default of 64 and adds a clarifying comment about linear-attention applicability.
Updates graph export CLI help text and CLI option defaulting to indicate a default of 64.
Adds/updates unit test and user documentation to assert/describe the new default.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
`src/test/llm/llmnode_test.cpp`	Adds assertions expecting `cache_interval_multiplier` to be present and equal to 64 by default.
`src/llm/llm_calculator.proto`	Sets proto2 default for `cache_interval_multiplier` to 64 and documents linear-attention applicability.
`src/graph_export/graph_cli_parser.cpp`	Updates CLI option description and sets a cxxopts default value of 64 for `cache_interval_multiplier`.
`docs/parameters.md`	Documents `--cache_interval_multiplier` and states default is 64.

+    ASSERT_TRUE(properties->schedulerConfig.cache_interval_multiplier.has_value());
+    ASSERT_EQ(properties->schedulerConfig.cache_interval_multiplier.value(), 64);


+    // Applicable only for models with linear attention.
+    optional uint64 cache_interval_multiplier = 25 [default = 64];


        ("cache_interval_multiplier",
-            "Multiplier for the KV cache block interval. Controls the granularity of cache allocation. Default: unset.",
-            cxxopts::value<uint64_t>(),
+            "Multiplier for the KV cache block interval. Controls the granularity of cache allocation. Applicable only for models with linear attention. Default: 64.",
+            cxxopts::value<uint64_t>()->default_value("64"),
            "CACHE_INTERVAL_MULTIPLIER");


 | `--reasoning_parser`                  | `string`     | Type of parser to use for reasoning content extraction from model output. Currently supported: [qwen3, gptoss, gemma4]                     |
 | `--tool_parser`                       | `string`     | Type of parser to use for tool calls extraction from model output. Currently supported: [llama3, phi4, hermes3, mistral, qwen3coder, gptoss, devstral, lfm2, gemma4]            |
 | `--enable_tool_guided_generation`     | `bool`       | Enables enforcing tool schema during generation. Requires setting response parser. Default: false.                         |
+| `--cache_interval_multiplier`         | `integer`    | Multiplier for the KV cache block interval. Controls the granularity of cache allocation. Applicable only for models with linear attention. Default: 64. |


change default cache_interal_miltiplier [test_doc_files_windows=demos…

96fb9e4

…/continuous_batching/agentic_ai/README.md]

Copilot AI review requested due to automatic review settings June 22, 2026 21:56

Copilot started reviewing on behalf of dtrawins June 22, 2026 21:56 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

fix tests

91ad990

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

change default cache_interal_miltiplier [test_doc_files_windows=demos…#4314

change default cache_interal_miltiplier [test_doc_files_windows=demos…#4314
dtrawins wants to merge 2 commits into
mainfrom
CVS-188061-64

dtrawins commented Jun 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		ASSERT_TRUE(properties->schedulerConfig.cache_interval_multiplier.has_value());
		ASSERT_EQ(properties->schedulerConfig.cache_interval_multiplier.value(), 64);

		// Applicable only for models with linear attention.
		optional uint64 cache_interval_multiplier = 25 [default = 64];

Uh oh!

Conversation

dtrawins commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛠 Summary

🧪 Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dtrawins commented Jun 22, 2026 •

edited

Loading