You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/api-management/api-management-howto-cache-external.md
+9-6Lines changed: 9 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: dlepow
6
6
7
7
ms.service: azure-api-management
8
8
ms.topic: how-to
9
-
ms.date: 09/11/2025
9
+
ms.date: 10/27/2025
10
10
ms.author: danlep
11
11
ms.custom: sfi-image-nochange
12
12
@@ -44,10 +44,10 @@ To complete this tutorial, you need to:
44
44
45
45
+[Create an Azure API Management instance](get-started-create-service-instance.md)
46
46
+ Understand [caching in Azure API Management](api-management-howto-cache.md)
47
-
+ Have an [Azure Managed Redis](../redis/quickstart-create-managed-redis.md), [Azure Cache for Redis](../azure-cache-for-redis/quickstart-create-redis.md), or another Redis-compatible cache available.
47
+
+ Have an [Azure Managed Redis](../redis/quickstart-create-managed-redis.md) or another Redis-compatible cache available.
48
48
49
49
> [!IMPORTANT]
50
-
> Azure API Management uses a Redis connection string to connect to the cache. If you use Azure Cache for Redis or Azure Managed Redis, enable access key authentication in your cache to use a connection string. Currently, you can't use Microsoft Entra authentication to connect Azure API Management to Azure Cache for Redis or Azure Managed Redis.
50
+
> Azure API Management uses a Redis connection string to connect to the cache. If you use Azure Managed Redis, enable access key authentication in your cache to use a connection string. Currently, you can't use Microsoft Entra authentication to connect Azure API Management to Azure Managed Redis.
51
51
52
52
### Redis cache for Kubernetes
53
53
@@ -57,7 +57,7 @@ For an API Management self-hosted gateway, caching requires an external cache. F
57
57
58
58
Follow the steps below to add an external Redis-compatible cache in Azure API Management. You can limit the cache to a specific gateway in your API Management instance.
59
59
60
-

60
+

61
61
62
62
### Use from setting
63
63
@@ -76,7 +76,7 @@ The **Use from** setting in the configuration specifies the location of your API
76
76
> [!NOTE]
77
77
> You can configure the same external cache for more than one API Management instance. The API Management instances can be in the same or different regions. When sharing the cache for more than one instance, you must select **Default** in the **Use from** setting.
78
78
79
-
### Add an Azure Cache for Redis or Azure Managed Redis instance from the same subscription
79
+
### Add an Azure Managed Redis instance from the same subscription
80
80
81
81
1. Browse to your API Management instance in the Azure portal.
82
82
1. In the left menu, under **Deployment + infrastructure** select **External cache**.
@@ -85,14 +85,17 @@ The **Use from** setting in the configuration specifies the location of your API
85
85
1. In the [**Use from**](#use-from-setting) dropdown, select **Default** or specify the desired region. The **Connection string** is automatically populated.
86
86
1. Select **Save**.
87
87
88
+
> [!NOTE]
89
+
> The default connection string is in the form `<cache-name>:10000,<cache-access-key>,ssl=True,abortConnect=False`. API Management stores the string as a secret named value. If you need to view or edit the string to rotate the access key or troubleshoot connection issues, go to the **Named values** blade.
90
+
88
91
### Add a Redis-compatible cache hosted outside of the current Azure subscription or Azure in general
89
92
90
93
1. Browse to your API Management instance in the Azure portal.
91
94
1. In the left menu, under **Deployment + infrastructure** select **External cache**.
92
95
1. Select **+ Add**.
93
96
1. In the **Cache instance** dropdown, select **Custom**.
94
97
1. In the [**Use from**](#use-from-setting) dropdown, select **Default** or specify the desired region.
95
-
1. Enter your Azure Cache for Redis, Azure Managed Redis, or Redis-compatible cache connection string in the **Connection string** field.
98
+
1. Enter your Azure Managed Redis or Redis-compatible cache connection string in the **Connection string** field.
Copy file name to clipboardExpand all lines: articles/api-management/api-management-howto-cache.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -82,7 +82,7 @@ With the caching policies shown in this example, the first request to a test ope
82
82
1. Select **Save**.
83
83
84
84
> [!TIP]
85
-
> If you're using an external cache, as described in [Use an external Azure Cache for Redis in Azure API Management](api-management-howto-cache-external.md), you might want to specify the `caching-type` attribute of the caching policies. See [API Management caching policies](api-management-policies.md#caching) for more information.
85
+
> If you're using an external cache, as described in [Use an external Redis-compatible cache in Azure API Management](api-management-howto-cache-external.md), you might want to specify the `caching-type` attribute of the caching policies. See [API Management caching policies](api-management-policies.md#caching) for more information.
Copy file name to clipboardExpand all lines: articles/api-management/api-management-howto-entra-external-id.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,7 +55,7 @@ Create an app registration in your Microsoft Entra ID tenant. The app registrati
55
55
* In the **Supported account types** section, select **Accounts in this organizational directory only**.
56
56
* In **Redirect URI**, select **Single-page application (SPA)** and enter the following URL: `https://{your-api-management-service-name}.developer.azure-api.net/signin`, where `{your-api-management-service-name}` is the name of your API Management instance.
57
57
* Select **Register** to create the application.
58
-
1.On the app **Overview** page, find the **Application (client) ID** and **Directory (tenant) ID** and copy theses values to a safe location. You need them later.
58
+
1.On the app **Overview** page, find the **Application (client) ID** and **Directory (tenant) ID** and copy these values to a safe location. You need them later.
59
59
1. In the sidebar menu, under **Manage**, select **Certificates & secrets**.
60
60
1. From the **Certificates & secrets** page, on the **Client secrets** tab, select **+ New client secret**.
Enable semantic caching of responses to Azure OpenAI API requests to reduce bandwidth and processing requirements imposed on the backend APIs and lower latency perceived by API consumers. With semantic caching, you can return cached responses for identical prompts and also for prompts that are similar in meaning, even if the text isn't the same. For background, see [Tutorial: Use Azure Cache for Redis as a semantic cache](../redis/tutorial-semantic-cache.md).
19
+
Enable semantic caching of responses to LLM API requests to reduce bandwidth and processing requirements imposed on the backend APIs and lower latency perceived by API consumers. With semantic caching, you can return cached responses for identical prompts and also for prompts that are similar in meaning, even if the text isn't identical. For background, see [Tutorial: Use Azure Managed Redis as a semantic cache](../redis/tutorial-semantic-cache.md).
20
20
21
21
> [!NOTE]
22
-
> The configuration steps in this article enable semantic caching for Azure OpenAI APIs. These steps can be generalized to enable semantic caching for corresponding large language model (LLM) APIs available through the [Azure AI Model Inference API](/rest/api/aifoundry/modelinference/) or with OpenAI-compatible models served through third-party inference providers.
22
+
> The configuration steps in this article show how to enable semantic caching for APIs added to API Management from Azure OpenAI in Azure AI Foundry models. You can apply similar steps to enable semantic caching for corresponding large language model (LLM) APIs available through the [Azure AI Model Inference API](/rest/api/aifoundry/modelinference/) or with OpenAI-compatible models served through third-party inference providers.
23
23
24
24
## Prerequisites
25
25
26
-
*One or more Azure OpenAI in Foundry Models APIs must be added to your API Management instance. For more information, see [Add an Azure OpenAI API to Azure API Management](azure-openai-api-from-specification.md).
27
-
*The Azure OpenAI instance must have deployments for the following:
26
+
*Add one or more Azure OpenAI in Azure AI Foundry model deployments as APIs to your API Management instance. For more information, see [Add an Azure OpenAI API to Azure API Management](azure-openai-api-from-specification.md).
27
+
*Create deployments for the following APIs:
28
28
29
29
* Chat Completion API - Deployment used for API consumer calls
30
30
* Embeddings API - Deployment used for semantic caching
31
-
*The API Management instance must be configured to use managed identity authentication to the Azure OpenAI APIs. For more information, see [Authenticate and authorize access to Azure OpenAI APIs using Azure API Management ](api-management-authenticate-authorize-azure-openai.md#authenticate-with-managed-identity).
32
-
* An [Azure Managed Redis](../redis/quickstart-create-managed-redis.md) instance. The **RediSearch** module must be enabled on the Redis cache.
31
+
*Configure the API Management instance to use managed identity authentication to the Azure OpenAI APIs. For more information, see [Authenticate and authorize access to Azure OpenAI APIs using Azure API Management ](api-management-authenticate-authorize-azure-openai.md#authenticate-with-managed-identity).
32
+
* An [Azure Managed Redis](../redis/quickstart-create-managed-redis.md) instance with the **RediSearch** module enabled on the Redis cache.
33
33
> [!NOTE]
34
-
> You can only enable the **RediSearch** module when creating a new Azure Redis Enterprise or Azure Managed Redis cache. You can't add a module to an existing cache. [Learn more](../redis/redis-modules.md)
35
-
*External cache configured in the Azure API Management instance. For steps, see [Use an external Redis-compatible cache in Azure API Management](api-management-howto-cache-external.md).
34
+
> You can only enable the **RediSearch** module when creating a new Azure Managed Redis cache. You can't add a module to an existing cache. [Learn more](../redis/redis-modules.md)
35
+
*Configure the Azure Managed Redis instance as an external cache in the Azure API Management instance. For steps, see [Use an external Redis-compatible cache in Azure API Management](api-management-howto-cache-external.md).
36
36
37
37
38
38
## Test Chat API deployment
39
39
40
-
First, test the Azure OpenAI deployment to ensure that the Chat Completion API or Chat API is working as expected. For steps, see [Import an Azure OpenAI API to Azure API Management](azure-openai-api-from-specification.md#test-the-azure-openai-api).
40
+
First, test the Azure OpenAI deployment to make sure the Chat Completion API or Chat API works as expected. For steps, see [Import an Azure OpenAI API to Azure API Management](azure-openai-api-from-specification.md#test-the-azure-openai-api).
41
41
42
42
For example, test the Azure OpenAI Chat API by sending a POST request to the API endpoint with a prompt in the request body. The response should include the completion of the prompt. Example request:
43
43
@@ -55,25 +55,25 @@ When the request succeeds, the response includes a completion for the chat messa
55
55
56
56
## Create a backend for embeddings API
57
57
58
-
Configure a [backend](backends.md) resource for the embeddings API deployment with the following settings:
58
+
Create a [backend](backends.md) resource for the embeddings API deployment with the following settings:
59
59
60
-
***Name** - A name of your choice, such as `embeddings-backend`. You use this name to reference the backend in policies.
60
+
***Name** - A name of your choice, such as *embeddings-backend*. You use this name to reference the backend in policies.
61
61
***Type** - Select **Custom URL**.
62
-
***Runtime URL** - The URL of the embeddings API deployment in Azure OpenAI, similar to: `https://my-aoai.openai.azure.com/openai/deployments/embeddings-deployment/embeddings`
62
+
***Runtime URL** - The URL of the embeddings API deployment in Azure OpenAI, similar to: `https://my-aoai.openai.azure.com/openai/deployments/embeddings-deployment/embeddings` (without query parameters).
63
63
64
64
***Authorization credentials** - Go to **Managed Identity** tab.
65
-
***Client identity** - Select *System assigned identity* or type in a User assigned managed identity client ID.
65
+
***Client identity** - Select *System assigned identity* or enter a user-assigned managed identity client ID.
66
66
***Resource ID** - Enter `https://cognitiveservices.azure.com/` for Azure OpenAI.
67
67
68
-
### Test backend
68
+
### Test embeddings backend
69
69
70
-
To test the backend, create an API operation for your Azure OpenAI API:
70
+
To test the embeddings backend, create an API operation for your Azure OpenAI API:
71
71
72
72
1. On the **Design** tab of your API, select **+ Add operation**.
73
-
1. Enter a **Display name** and optionally a **Name** for the operation.
73
+
1. Enter a **Display name**such as *Embeddings*and optionally a **Name** for the operation.
74
74
1. In the **Frontend** section, in **URL**, select **POST** and enter the path `/`.
75
75
1. On the **Headers** tab, add a required header with the name `Content-Type` and value `application/json`.
76
-
1. Select **Save**
76
+
1. Select **Save**.
77
77
78
78
Configure the following policies in the **Inbound processing** section of the API operation. In the [set-backend-service](set-backend-service-policy.md) policy, substitute the name of the backend you created.
79
79
@@ -94,7 +94,7 @@ On the **Test** tab, test the operation by adding an `api-version` query paramet
94
94
{"input":"Hello"}
95
95
```
96
96
97
-
If the request is successful, the response includes a vector representation of the input text:
97
+
If the request is successful, the response includes a vector representation of the input text. Example response:
98
98
99
99
```json
100
100
{
@@ -125,8 +125,8 @@ To enable semantic caching for Azure OpenAI APIs in Azure API Management, apply
125
125
126
126
```xml
127
127
<azure-openai-semantic-cache-lookup
128
-
score-threshold="0.8"
129
-
embeddings-backend-id="embeddings-deployment"
128
+
score-threshold="0.15"
129
+
embeddings-backend-id="embeddings-backend"
130
130
embeddings-backend-auth="system-assigned"
131
131
ignore-system-messages="true"
132
132
max-message-count="10">
@@ -151,14 +151,16 @@ To enable semantic caching for Azure OpenAI APIs in Azure API Management, apply
151
151
152
152
## Confirm caching
153
153
154
-
To confirm that semantic caching is working as expected, trace a test Completion or Chat Completion operation using the test console in the portal. Confirm that the cache was used on subsequent tries by inspecting the trace. [Learn more about tracing API calls in Azure API Management](api-management-howto-api-inspector.md).
154
+
To confirm that semantic caching works as expected, trace a test Completion or Chat Completion operation by using the test console in the portal. Confirm that the cache is used on subsequent tries by inspecting the trace. [Learn more about tracing API calls in Azure API Management](api-management-howto-api-inspector.md).
155
155
156
-
For example, if the cache was used, the **Output** section includes entries similar to ones in the following screenshot:
156
+
Adjust the `score-threshold` attribute in the lookup policy to control how closely an incoming prompt must match a cached prompt to return its stored response. A lower score threshold means that prompts must have higher semantic similarity to return cached responses. Prompts with scores above the threshold don't use the cached response.
157
+
158
+
For example, if the cache is used, the **Output** section includes entries similar to the following screenshot:
157
159
158
160
:::image type="content" source="media/azure-openai-enable-semantic-caching/cache-lookup.png" alt-text="Screenshot of request trace in the Azure portal.":::
0 commit comments