Skip to content

Latest commit

 

History

History
58 lines (42 loc) · 2.71 KB

File metadata and controls

58 lines (42 loc) · 2.71 KB
title Azure API Management policy reference - llm-semantic-cache-lookup | Microsoft Docs
description Reference for the llm-semantic-cache-lookup policy available for use in Azure API Management. Provides policy usage, settings, and examples.
services api-management
author dlepow
ms.service azure-api-management
ms.collection ce-skilling-ai-copilot
ms.custom
build-2024
ms.topic reference
ms.date 02/23/2026
ms.update-cycle 180-days
ms.author danlep

Get cached responses of large language model API requests

[!INCLUDE api-management-availability-all-tiers]

Use the llm-semantic-cache-lookup policy to perform cache lookup of responses to large language model (LLM) API requests from a configured external cache, based on vector proximity of the prompt to previous requests and a specified score threshold. Response caching reduces bandwidth and processing requirements imposed on the backend LLM API and lowers latency perceived by API consumers.

Note

[!INCLUDE api-management-policy-generic-alert]

[!INCLUDE api-management-llm-models]

Policy statement

<llm-semantic-cache-lookup
    score-threshold="score threshold to return cached response"
    embeddings-backend-id ="backend entity ID for embeddings API"
    embeddings-backend-auth ="system-assigned"             
    ignore-system-messages="true | false"      
    max-message-count="count" >
    <vary-by>"expression to partition caching"</vary-by>
</llm-semantic-cache-lookup>

[!INCLUDE api-management-semantic-cache-policy-details]

Examples

Example with corresponding llm-semantic-cache-store policy

[!INCLUDE api-management-llm-semantic-cache-example]

Related policies

[!INCLUDE api-management-policy-ref-next-steps]