| title | Azure API Management policy reference - llm-semantic-cache-lookup | Microsoft Docs | |
|---|---|---|
| description | Reference for the llm-semantic-cache-lookup policy available for use in Azure API Management. Provides policy usage, settings, and examples. | |
| services | api-management | |
| author | dlepow | |
| ms.service | azure-api-management | |
| ms.collection | ce-skilling-ai-copilot | |
| ms.custom |
|
|
| ms.topic | reference | |
| ms.date | 02/23/2026 | |
| ms.update-cycle | 180-days | |
| ms.author | danlep |
[!INCLUDE api-management-availability-all-tiers]
Use the llm-semantic-cache-lookup policy to perform cache lookup of responses to large language model (LLM) API requests from a configured external cache, based on vector proximity of the prompt to previous requests and a specified score threshold. Response caching reduces bandwidth and processing requirements imposed on the backend LLM API and lowers latency perceived by API consumers.
Note
- This policy must have a corresponding Cache responses to large language model API requests policy.
- For prerequisites and steps to enable semantic caching, see Enable semantic caching for LLM APIs in Azure API Management.
- Because semantic caching returns responses based on similarity (not exact match), it can surface responses that are incorrect, outdated, or unsafe for the current request. Evaluate this feature carefully for your workload and include safeguards.
[!INCLUDE api-management-policy-generic-alert]
[!INCLUDE api-management-llm-models]
<llm-semantic-cache-lookup
score-threshold="score threshold to return cached response"
embeddings-backend-id ="backend entity ID for embeddings API"
embeddings-backend-auth ="system-assigned"
ignore-system-messages="true | false"
max-message-count="count" >
<vary-by>"expression to partition caching"</vary-by>
</llm-semantic-cache-lookup>[!INCLUDE api-management-semantic-cache-policy-details]
[!INCLUDE api-management-llm-semantic-cache-example]
[!INCLUDE api-management-policy-ref-next-steps]