Skip to content

change default cache_interal_miltiplier [test_doc_files_windows=demos…#4314

Open
dtrawins wants to merge 2 commits into
mainfrom
CVS-188061-64
Open

change default cache_interal_miltiplier [test_doc_files_windows=demos…#4314
dtrawins wants to merge 2 commits into
mainfrom
CVS-188061-64

Conversation

@dtrawins

@dtrawins dtrawins commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

…/continuous_batching/agentic_ai/README.md]

🛠 Summary

CVS-188061 - new default value for cache_interval_multiplier=64 is more tunned for long contexts which are expected to be typical for models with linear attention

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

Copilot AI review requested due to automatic review settings June 22, 2026 21:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to change the default behavior and documentation for cache_interval_multiplier (intended to default to 64) to better support long-context workloads typical for linear-attention models.

Changes:

  • Updates LLMCalculatorOptions.cache_interval_multiplier in the proto to specify a default of 64 and adds a clarifying comment about linear-attention applicability.
  • Updates graph export CLI help text and CLI option defaulting to indicate a default of 64.
  • Adds/updates unit test and user documentation to assert/describe the new default.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/test/llm/llmnode_test.cpp Adds assertions expecting cache_interval_multiplier to be present and equal to 64 by default.
src/llm/llm_calculator.proto Sets proto2 default for cache_interval_multiplier to 64 and documents linear-attention applicability.
src/graph_export/graph_cli_parser.cpp Updates CLI option description and sets a cxxopts default value of 64 for cache_interval_multiplier.
docs/parameters.md Documents --cache_interval_multiplier and states default is 64.

Comment on lines +4252 to +4253
ASSERT_TRUE(properties->schedulerConfig.cache_interval_multiplier.has_value());
ASSERT_EQ(properties->schedulerConfig.cache_interval_multiplier.value(), 64);
Comment on lines +139 to +140
// Applicable only for models with linear attention.
optional uint64 cache_interval_multiplier = 25 [default = 64];
Comment on lines 83 to 86
("cache_interval_multiplier",
"Multiplier for the KV cache block interval. Controls the granularity of cache allocation. Default: unset.",
cxxopts::value<uint64_t>(),
"Multiplier for the KV cache block interval. Controls the granularity of cache allocation. Applicable only for models with linear attention. Default: 64.",
cxxopts::value<uint64_t>()->default_value("64"),
"CACHE_INTERVAL_MULTIPLIER");
Comment thread docs/parameters.md
| `--reasoning_parser` | `string` | Type of parser to use for reasoning content extraction from model output. Currently supported: [qwen3, gptoss, gemma4] |
| `--tool_parser` | `string` | Type of parser to use for tool calls extraction from model output. Currently supported: [llama3, phi4, hermes3, mistral, qwen3coder, gptoss, devstral, lfm2, gemma4] |
| `--enable_tool_guided_generation` | `bool` | Enables enforcing tool schema during generation. Requires setting response parser. Default: false. |
| `--cache_interval_multiplier` | `integer` | Multiplier for the KV cache block interval. Controls the granularity of cache allocation. Applicable only for models with linear attention. Default: 64. |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants