MicrosoftDocs
diff --git a/‎articles/api-management/api-management-howto-cache-external.md‎
Lines changed: 9 additions & 6 deletions b/‎articles/api-management/api-management-howto-cache-external.md‎
Lines changed: 9 additions & 6 deletions
diff --git a/‎articles/api-management/api-management-howto-cache.md‎
Lines changed: 1 addition & 1 deletion b/‎articles/api-management/api-management-howto-cache.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/api-management/api-management-howto-entra-external-id.md‎
Lines changed: 1 addition & 1 deletion b/‎articles/api-management/api-management-howto-entra-external-id.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/api-management/azure-openai-enable-semantic-caching.md‎
Lines changed: 29 additions & 27 deletions b/‎articles/api-management/azure-openai-enable-semantic-caching.md‎
Lines changed: 29 additions & 27 deletions
@@ -6,7 +6,7 @@ author: dlepow
 
 ms.service: azure-api-management
 ms.topic: how-to
-ms.date: 09/11/2025
+ms.date: 10/27/2025
 ms.author: danlep
 ms.custom: sfi-image-nochange
 
@@ -44,10 +44,10 @@ To complete this tutorial, you need to:
 
 + [Create an Azure API Management instance](get-started-create-service-instance.md)
 + Understand [caching in Azure API Management](api-management-howto-cache.md)
-+ Have an [Azure Managed Redis](../redis/quickstart-create-managed-redis.md), [Azure Cache for Redis](../azure-cache-for-redis/quickstart-create-redis.md), or another Redis-compatible cache available.
++ Have an [Azure Managed Redis](../redis/quickstart-create-managed-redis.md) or another Redis-compatible cache available.
 
     > [!IMPORTANT]
-    > Azure API Management uses a Redis connection string to connect to the cache. If you use Azure Cache for Redis or Azure Managed Redis, enable access key authentication in your cache to use a connection string. Currently, you can't use Microsoft Entra authentication to connect Azure API Management to Azure Cache for Redis or Azure Managed Redis.
+    > Azure API Management uses a Redis connection string to connect to the cache. If you use Azure Managed Redis, enable access key authentication in your cache to use a connection string. Currently, you can't use Microsoft Entra authentication to connect Azure API Management to Azure Managed Redis.
 
 ### Redis cache for Kubernetes
 
@@ -57,7 +57,7 @@ For an API Management self-hosted gateway, caching requires an external cache. F
 
 Follow the steps below to add an external Redis-compatible cache in Azure API Management. You can limit the cache to a specific gateway in your API Management instance.
 
-![Screenshot that shows how to add an external Azure Cache for Redis in Azure API Management.](media/api-management-howto-cache-external/add-external-cache.png)
+![Screenshot that shows how to add an external Azure Managed Redis cache in Azure API Management.](media/api-management-howto-cache-external/add-external-cache.png)
 
 ### Use from setting
 
@@ -76,7 +76,7 @@ The **Use from** setting in the configuration specifies the location of your API
 > [!NOTE]
 > You can configure the same external cache for more than one API Management instance. The API Management instances can be in the same or different regions. When sharing the cache for more than one instance, you must select **Default** in the **Use from** setting. 
 
-### Add an Azure Cache for Redis or Azure Managed Redis instance from the same subscription
+### Add an Azure Managed Redis instance from the same subscription
 
 1. Browse to your API Management instance in the Azure portal.
 1. In the left menu, under **Deployment + infrastructure** select **External cache**.
@@ -85,14 +85,17 @@ The **Use from** setting in the configuration specifies the location of your API
 1. In the [**Use from**](#use-from-setting) dropdown, select **Default** or specify the desired region. The **Connection string** is automatically populated.
 1. Select **Save**.
 
+> [!NOTE]
+> The default connection string is in the form `<cache-name>:10000,<cache-access-key>,ssl=True,abortConnect=False`. API Management stores the string as a secret named value. If you need to view or edit the string to  rotate the access key or troubleshoot connection issues, go to the **Named values** blade.
+
 ### Add a Redis-compatible cache hosted outside of the current Azure subscription or Azure in general
 
 1. Browse to your API Management instance in the Azure portal.
 1. In the left menu, under **Deployment + infrastructure** select **External cache**.
 1. Select **+ Add**.
 1. In the **Cache instance** dropdown, select **Custom**.
 1. In the [**Use from**](#use-from-setting) dropdown, select **Default** or specify the desired region.
-1. Enter your Azure Cache for Redis, Azure Managed Redis, or Redis-compatible cache connection string in the **Connection string** field.
+1. Enter your Azure Managed Redis or Redis-compatible cache connection string in the **Connection string** field.
 1. Select **Save**.
 
 ### Add a Redis cache to a self-hosted gateway
 
@@ -82,7 +82,7 @@ With the caching policies shown in this example, the first request to a test ope
 1. Select **Save**.
 
 > [!TIP]
-> If you're using an external cache, as described in [Use an external Azure Cache for Redis in Azure API Management](api-management-howto-cache-external.md), you might want to specify the `caching-type` attribute of the caching policies. See [API Management caching policies](api-management-policies.md#caching) for more information.
+> If you're using an external cache, as described in [Use an external Redis-compatible cache in Azure API Management](api-management-howto-cache-external.md), you might want to specify the `caching-type` attribute of the caching policies. See [API Management caching policies](api-management-policies.md#caching) for more information.
 
 ## Call an operation to test the caching
 
 
@@ -55,7 +55,7 @@ Create an app registration in your Microsoft Entra ID tenant. The app registrati
     * In the **Supported account types** section, select **Accounts in this organizational directory only**.
     * In **Redirect URI**, select **Single-page application (SPA)** and enter the following URL: `https://{your-api-management-service-name}.developer.azure-api.net/signin`, where `{your-api-management-service-name}` is the name of your API Management instance.
     * Select **Register** to create the application.
-1.On the app **Overview** page, find the **Application (client) ID** and **Directory (tenant) ID** and copy theses values to a safe location. You need them later.
+1.On the app **Overview** page, find the **Application (client) ID** and **Directory (tenant) ID** and copy these values to a safe location. You need them later.
 1. In the sidebar menu, under **Manage**, select **Certificates & secrets**. 
 1. From the **Certificates & secrets** page, on the **Client secrets** tab, select **+ New client secret**. 
     * Enter a **Description**.
 
@@ -1,43 +1,43 @@
 ---
-title: Enable semantic caching for Azure OpenAI APIs in Azure API Management
-description: Prerequisites and configuration steps to enable semantic caching for Azure OpenAI APIs in Azure API Management.
+title: Enable Semantic Caching for LLM APIs in Azure API Management
+description: Prerequisites and configuration steps to enable semantic caching for Azure OpenAI and other LLM APIs in Azure API Management.
 author: dlepow
 ms.service: azure-api-management
 ms.custom:
   - build-2024
 ms.topic: how-to
-ms.date: 01/13/2025
+ms.date: 10/28/2025
 ms.update-cycle: 180-days
 ms.author: danlep
 ms.collection: ce-skilling-ai-copilot
 ---
 
-# Enable semantic caching for Azure OpenAI APIs in Azure API Management
+# Enable semantic caching for LLM APIs in Azure API Management
 
 [!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
 
-Enable semantic caching of responses to Azure OpenAI API requests to reduce bandwidth and processing requirements imposed on the backend APIs and lower latency perceived by API consumers. With semantic caching, you can return cached responses for identical prompts and also for prompts that are similar in meaning, even if the text isn't the same. For background, see [Tutorial: Use Azure Cache for Redis as a semantic cache](../redis/tutorial-semantic-cache.md).
+Enable semantic caching of responses to LLM API requests to reduce bandwidth and processing requirements imposed on the backend APIs and lower latency perceived by API consumers. With semantic caching, you can return cached responses for identical prompts and also for prompts that are similar in meaning, even if the text isn't identical. For background, see [Tutorial: Use Azure Managed Redis as a semantic cache](../redis/tutorial-semantic-cache.md).
 
 > [!NOTE]
-> The configuration steps in this article enable semantic caching for Azure OpenAI APIs. These steps can be generalized to enable semantic caching for corresponding large language model (LLM) APIs available through the [Azure AI Model Inference API](/rest/api/aifoundry/modelinference/) or with OpenAI-compatible models served through third-party inference providers. 
+> The configuration steps in this article show how to enable semantic caching for APIs added to API Management from Azure OpenAI in Azure AI Foundry models. You can apply similar steps to enable semantic caching for corresponding large language model (LLM) APIs available through the [Azure AI Model Inference API](/rest/api/aifoundry/modelinference/) or with OpenAI-compatible models served through third-party inference providers. 
 
 ## Prerequisites
 
-* One or more Azure OpenAI in Foundry Models APIs must be added to your API Management instance. For more information, see [Add an Azure OpenAI API to Azure API Management](azure-openai-api-from-specification.md).
-* The Azure OpenAI instance must have deployments for the following:
+* Add one or more Azure OpenAI in Azure AI Foundry model deployments as APIs to your API Management instance. For more information, see [Add an Azure OpenAI API to Azure API Management](azure-openai-api-from-specification.md).
+* Create deployments for the following APIs:
 
     * Chat Completion API - Deployment used for API consumer calls 
     * Embeddings API - Deployment used for semantic caching
-* The API Management instance must be configured to use managed identity authentication to the Azure OpenAI APIs. For more information, see [Authenticate and authorize access to Azure OpenAI APIs using Azure API Management ](api-management-authenticate-authorize-azure-openai.md#authenticate-with-managed-identity).
-* An [Azure Managed Redis](../redis/quickstart-create-managed-redis.md) instance. The **RediSearch** module must be enabled on the Redis cache.
+* Configure the API Management instance to use managed identity authentication to the Azure OpenAI APIs. For more information, see [Authenticate and authorize access to Azure OpenAI APIs using Azure API Management ](api-management-authenticate-authorize-azure-openai.md#authenticate-with-managed-identity).
+* An [Azure Managed Redis](../redis/quickstart-create-managed-redis.md) instance with the **RediSearch** module enabled on the Redis cache.
     > [!NOTE]
-    > You can only enable the **RediSearch** module when creating a new Azure Redis Enterprise or Azure Managed Redis cache. You can't add a module to an existing cache. [Learn more](../redis/redis-modules.md)
-* External cache configured in the Azure API Management instance. For steps, see [Use an external Redis-compatible cache in Azure API Management](api-management-howto-cache-external.md).
+    > You can only enable the **RediSearch** module when creating a new  Azure Managed Redis cache. You can't add a module to an existing cache. [Learn more](../redis/redis-modules.md)
+* Configure the Azure Managed Redis instance as an external cache in the Azure API Management instance. For steps, see [Use an external Redis-compatible cache in Azure API Management](api-management-howto-cache-external.md).
 
 
 ## Test Chat API deployment
 
-First, test the Azure OpenAI deployment to ensure that the Chat Completion API or Chat API is working as expected. For steps, see [Import an Azure OpenAI API to Azure API Management](azure-openai-api-from-specification.md#test-the-azure-openai-api).
+First, test the Azure OpenAI deployment to make sure the Chat Completion API or Chat API works as expected. For steps, see [Import an Azure OpenAI API to Azure API Management](azure-openai-api-from-specification.md#test-the-azure-openai-api).
 
 For example, test the Azure OpenAI Chat API by sending a POST request to the API endpoint with a prompt in the request body. The response should include the completion of the prompt. Example request:
 
@@ -55,25 +55,25 @@ When the request succeeds, the response includes a completion for the chat messa
 
 ## Create a backend for embeddings API
 
-Configure a [backend](backends.md) resource for the embeddings API deployment with the following settings:
+Create a [backend](backends.md) resource for the embeddings API deployment with the following settings:
 
-* **Name** - A name of your choice, such as `embeddings-backend`. You use this name to reference the backend in policies.
+* **Name** - A name of your choice, such as *embeddings-backend*. You use this name to reference the backend in policies.
 * **Type** - Select **Custom URL**.
-* **Runtime URL** - The URL of the embeddings API deployment in Azure OpenAI, similar to: `https://my-aoai.openai.azure.com/openai/deployments/embeddings-deployment/embeddings`
+* **Runtime URL** - The URL of the embeddings API deployment in Azure OpenAI, similar to: `https://my-aoai.openai.azure.com/openai/deployments/embeddings-deployment/embeddings` (without query parameters).
 
 * **Authorization credentials** - Go to **Managed Identity** tab.
-  * **Client identity** - Select *System assigned identity* or type in a User assigned managed identity client ID.
+  * **Client identity** - Select *System assigned identity* or enter a user-assigned managed identity client ID.
   * **Resource ID** - Enter `https://cognitiveservices.azure.com/` for Azure OpenAI.
 
-### Test backend 
+### Test embeddings backend 
 
-To test the backend, create an API operation for your Azure OpenAI API:
+To test the embeddings backend, create an API operation for your Azure OpenAI API:
 
 1. On the **Design** tab of your API, select **+ Add operation**.
-1. Enter a **Display name** and optionally a **Name** for the operation.
+1. Enter a **Display name** such as *Embeddings* and optionally a **Name** for the operation.
 1. In the **Frontend** section, in **URL**, select **POST** and enter the path `/`.
 1. On the **Headers** tab, add a required header with the name `Content-Type` and value `application/json`.
-1. Select **Save**
+1. Select **Save**.
 
 Configure the following policies in the **Inbound processing** section of the API operation. In the [set-backend-service](set-backend-service-policy.md) policy, substitute the name of the backend you created.
 
@@ -94,7 +94,7 @@ On the **Test** tab, test the operation by adding an `api-version` query paramet
 {"input":"Hello"}
 ```        
 
-If the request is successful, the response includes a vector representation of the input text:
+If the request is successful, the response includes a vector representation of the input text. Example response:
 
 ```json
 {
@@ -125,8 +125,8 @@ To enable semantic caching for Azure OpenAI APIs in Azure API Management, apply
 
     ```xml
     <azure-openai-semantic-cache-lookup
-        score-threshold="0.8"
-        embeddings-backend-id="embeddings-deployment"
+        score-threshold="0.15"
+        embeddings-backend-id="embeddings-backend"
         embeddings-backend-auth="system-assigned"
         ignore-system-messages="true"
         max-message-count="10">
@@ -151,14 +151,16 @@ To enable semantic caching for Azure OpenAI APIs in Azure API Management, apply
 
 ## Confirm caching
 
-To confirm that semantic caching is working as expected, trace a test Completion or Chat Completion operation using the test console in the portal. Confirm that the cache was used on subsequent tries by inspecting the trace. [Learn more about tracing API calls in Azure API Management](api-management-howto-api-inspector.md).
+To confirm that semantic caching works as expected, trace a test Completion or Chat Completion operation by using the test console in the portal. Confirm that the cache is used on subsequent tries by inspecting the trace. [Learn more about tracing API calls in Azure API Management](api-management-howto-api-inspector.md). 
 
-For example, if the cache was used, the **Output** section includes entries similar to ones in the following screenshot:
+Adjust the `score-threshold` attribute in the lookup policy to control how closely an incoming prompt must match a cached prompt to return its stored response. A lower score threshold means that prompts must have higher semantic similarity to return cached responses. Prompts with scores above the threshold don't use the cached response.
+
+For example, if the cache is used, the **Output** section includes entries similar to the following screenshot:
 
 :::image type="content" source="media/azure-openai-enable-semantic-caching/cache-lookup.png" alt-text="Screenshot of request trace in the Azure portal.":::
 
 ## Related content
 
 * [Caching policies](api-management-policies.md#caching)
-* [Azure Cache for Redis](../azure-cache-for-redis/cache-overview.md)
+* [Azure Managed Redis](../redis/overview.md)
 * [AI gateway capabilities](genai-gateway-capabilities.md) in Azure API Management