azure-docs/includes/api-management-semantic-cache-policy-details.md at fe3d7e1aa0f8e37f19985ebfb10de416abdca504 · MicrosoftDocs/azure-docs

author

dlepow

ms.service

azure-api-management

ms.custom

build-2024

ms.topic

include

ms.date

10/29/2025

ms.author

danlep

Attributes

Attribute	Description	Required	Default
score-threshold	Score threshold defines how closely an incoming prompt must match a cached prompt to return its stored response. The value ranges from 0.0 to 1.0. Lower values require higher semantic similarity for a match. Learn more.	Yes	N/A
embeddings-backend-id	Backend ID for embeddings API call.	Yes	N/A
embeddings-backend-auth	Authentication used for embeddings API backend.	Yes. Must be set to `system-assigned`.	N/A
ignore-system-messages	Boolean. When set to `true` (recommended), removes system messages from a chat completion prompt before assessing cache similarity.	No	false
max-message-count	If specified, number of remaining dialog messages after which caching is skipped.	No	N/A

Name	Description	Required
vary-by	A custom expression determined at runtime whose value partitions caching. If multiple `vary-by` elements are added, values are concatenated to create a unique combination.	No

This policy can only be used once in a policy section.
Fine-tune the value of score-threshold based on your application to ensure that the right sensitivity is used to determine when to return cached responses for queries. Start with a low value such as 0.05 and adjust to optimize the ratio of cache hits to misses.
Score threshold above 0.2 may lead to cache mismatch. Consider using lower value for sensitive use cases.
Control cross-user access to cache entries by specifying vary-by with specific user or user-group identifiers.
The embeddings model should have enough capacity and sufficient context size to accommodate the prompt volume and prompts.
Consider adding llm-content-safety policy with prompt shield to protect from prompt attacks.
[!INCLUDE api-management-cache-rate-limit]