change default cache_interal_miltiplier [test_doc_files_windows=demos…#4314
Open
dtrawins wants to merge 2 commits into
Open
change default cache_interal_miltiplier [test_doc_files_windows=demos…#4314dtrawins wants to merge 2 commits into
dtrawins wants to merge 2 commits into
Conversation
…/continuous_batching/agentic_ai/README.md]
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to change the default behavior and documentation for cache_interval_multiplier (intended to default to 64) to better support long-context workloads typical for linear-attention models.
Changes:
- Updates
LLMCalculatorOptions.cache_interval_multiplierin the proto to specify a default of64and adds a clarifying comment about linear-attention applicability. - Updates graph export CLI help text and CLI option defaulting to indicate a default of
64. - Adds/updates unit test and user documentation to assert/describe the new default.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/test/llm/llmnode_test.cpp |
Adds assertions expecting cache_interval_multiplier to be present and equal to 64 by default. |
src/llm/llm_calculator.proto |
Sets proto2 default for cache_interval_multiplier to 64 and documents linear-attention applicability. |
src/graph_export/graph_cli_parser.cpp |
Updates CLI option description and sets a cxxopts default value of 64 for cache_interval_multiplier. |
docs/parameters.md |
Documents --cache_interval_multiplier and states default is 64. |
Comment on lines
+4252
to
+4253
| ASSERT_TRUE(properties->schedulerConfig.cache_interval_multiplier.has_value()); | ||
| ASSERT_EQ(properties->schedulerConfig.cache_interval_multiplier.value(), 64); |
Comment on lines
+139
to
+140
| // Applicable only for models with linear attention. | ||
| optional uint64 cache_interval_multiplier = 25 [default = 64]; |
Comment on lines
83
to
86
| ("cache_interval_multiplier", | ||
| "Multiplier for the KV cache block interval. Controls the granularity of cache allocation. Default: unset.", | ||
| cxxopts::value<uint64_t>(), | ||
| "Multiplier for the KV cache block interval. Controls the granularity of cache allocation. Applicable only for models with linear attention. Default: 64.", | ||
| cxxopts::value<uint64_t>()->default_value("64"), | ||
| "CACHE_INTERVAL_MULTIPLIER"); |
| | `--reasoning_parser` | `string` | Type of parser to use for reasoning content extraction from model output. Currently supported: [qwen3, gptoss, gemma4] | | ||
| | `--tool_parser` | `string` | Type of parser to use for tool calls extraction from model output. Currently supported: [llama3, phi4, hermes3, mistral, qwen3coder, gptoss, devstral, lfm2, gemma4] | | ||
| | `--enable_tool_guided_generation` | `bool` | Enables enforcing tool schema during generation. Requires setting response parser. Default: false. | | ||
| | `--cache_interval_multiplier` | `integer` | Multiplier for the KV cache block interval. Controls the granularity of cache allocation. Applicable only for models with linear attention. Default: 64. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…/continuous_batching/agentic_ai/README.md]
🛠 Summary
CVS-188061 - new default value for cache_interval_multiplier=64 is more tunned for long contexts which are expected to be typical for models with linear attention
🧪 Checklist
``