Skip to content

Latest commit

 

History

History
68 lines (45 loc) · 3.1 KB

File metadata and controls

68 lines (45 loc) · 3.1 KB
title Azure API Management policy reference - llm-semantic-cache-store
description Reference for the llm-semantic-cache-store policy available for use in Azure API Management. Provides policy usage, settings, and examples.
services api-management
author dlepow
ms.service azure-api-management
ms.collection ce-skilling-ai-copilot
ms.custom
ms.topic reference
ms.date 02/23/2026
ms.update-cycle 180-days
ms.author danlep

Cache responses to large language model API requests

[!INCLUDE api-management-availability-all-tiers]

The llm-semantic-cache-store policy caches responses to chat completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.

Note

[!INCLUDE api-management-policy-generic-alert]

[!INCLUDE api-management-llm-models]

Policy statement

<llm-semantic-cache-store duration="seconds"/>

Attributes

Attribute Description Required Default
duration Time-to-live of the cached entries, specified in seconds. Policy expressions are allowed. Yes N/A

Usage

Usage notes

  • This policy can only be used once in a policy section.
  • If the cache lookup fails, the API call that uses the cache-related operation doesn't raise an error, and the cache operation completes successfully.
  • [!INCLUDE api-management-cache-rate-limit]

Examples

Example with corresponding llm-semantic-cache-lookup policy

[!INCLUDE api-management-llm-semantic-cache-example]

Related policies

[!INCLUDE api-management-policy-ref-next-steps]