title

Azure API Management policy reference - llm-semantic-cache-lookup | Microsoft Docs

description

Reference for the llm-semantic-cache-lookup policy available for use in Azure API Management. Provides policy usage, settings, and examples.

services

api-management

author

dlepow

ms.service

azure-api-management

ms.collection

ce-skilling-ai-copilot

ms.custom

build-2024

ms.topic

reference

ms.date

02/23/2026

ms.update-cycle

180-days

ms.author

danlep

Get cached responses of large language model API requests

[!INCLUDE api-management-availability-all-tiers]

Use the llm-semantic-cache-lookup policy to perform cache lookup of responses to large language model (LLM) API requests from a configured external cache, based on vector proximity of the prompt to previous requests and a specified score threshold. Response caching reduces bandwidth and processing requirements imposed on the backend LLM API and lowers latency perceived by API consumers.

Note

This policy must have a corresponding Cache responses to large language model API requests policy.
For prerequisites and steps to enable semantic caching, see Enable semantic caching for LLM APIs in Azure API Management.
Because semantic caching returns responses based on similarity (not exact match), it can surface responses that are incorrect, outdated, or unsafe for the current request. Evaluate this feature carefully for your workload and include safeguards.

[!INCLUDE api-management-policy-generic-alert]

[!INCLUDE api-management-llm-models]

Policy statement

<llm-semantic-cache-lookup
    score-threshold="score threshold to return cached response"
    embeddings-backend-id ="backend entity ID for embeddings API"
    embeddings-backend-auth ="system-assigned"             
    ignore-system-messages="true | false"      
    max-message-count="count" >
    <vary-by>"expression to partition caching"</vary-by>
</llm-semantic-cache-lookup>

[!INCLUDE api-management-semantic-cache-policy-details]

Examples

Example with corresponding llm-semantic-cache-store policy

[!INCLUDE api-management-llm-semantic-cache-example]

Related policies

[!INCLUDE api-management-policy-ref-next-steps]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get cached responses of large language model API requests

Policy statement

Examples

Example with corresponding llm-semantic-cache-store policy

Related policies

FilesExpand file tree

llm-semantic-cache-lookup-policy.md

Latest commit

History

llm-semantic-cache-lookup-policy.md

File metadata and controls

Get cached responses of large language model API requests

Policy statement

Examples

Example with corresponding llm-semantic-cache-store policy

Related policies