edit

gitName · gitName · commit 271dcefcfab1 · 2025-11-17T10:28:10.000-08:00
diff --git a/articles/api-management/azure-openai-token-limit-policy.md b/articles/api-management/azure-openai-token-limit-policy.md
@@ -20,7 +20,7 @@ ms.author: danlep
 
 The `azure-openai-token-limit` policy prevents Azure OpenAI in Foundry Models API usage spikes on a per key basis by limiting consumption of language model tokens to a specified rate (number per minute), a quota over a specified period, or both. When a specified token rate limit is exceeded, the caller receives a `429 Too Many Requests` response status code. When a specified quota is exceeded, the caller receives a `403 Forbidden` response status code.
 
-By relying on token usage metrics returned from the Azure OpenAI endpoint, the policy monitors and enforces limits based on actual token consumption. The policy also enables estimation of prompt tokens in advance by API Management, minimizing unnecessary requests to the Azure OpenAI backend if the limit is already exceeded. However, because the actual number of tokens consumed depends on both the prompt size and the completion size (which varies by request), the policy can't predict total token consumption in advance. This affects how limits are enforced when multiple requests are processed concurrently.
+By relying on token usage metrics returned from the Azure OpenAI endpoint, the policy monitors and enforces limits based on actual token consumption. The policy also enables estimation of prompt tokens in advance by API Management, minimizing unnecessary requests to the Azure OpenAI backend if the limit is already exceeded. However, because the actual number of tokens consumed depends on both the prompt size and the completion size (which varies by request), the policy can't predict total token consumption in advance. This design could allow token limits to be exceeded temporarily when multiple requests are processed concurrently.
 
 [!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
 
diff --git a/articles/api-management/llm-token-limit-policy.md b/articles/api-management/llm-token-limit-policy.md
@@ -19,7 +19,7 @@ ms.author: danlep
 
 The `llm-token-limit` policy prevents large language model (LLM) API usage spikes on a per key basis by limiting consumption of language model tokens to either a specified rate (number per minute), a quota over a specified period, or both. When a specified token rate limit is exceeded, the caller receives a `429 Too Many Requests` response status code. When a specified quota is exceeded, the caller receives a `403 Forbidden` response status code.
 
-By relying on token usage metrics returned from the LLM endpoint, the policy monitors and enforces limits based on actual token consumption. The policy also enables estimation of prompt tokens in advance by API Management, minimizing unnecessary requests to the LLM backend if the limit is already exceeded. However, because the actual number of tokens consumed depends on both the prompt size and the completion size (which varies by request), the policy can't predict total token consumption in advance. This affects how limits are enforced when multiple requests are processed concurrently.
+By relying on token usage metrics returned from the LLM endpoint, the policy monitors and enforces limits based on actual token consumption. The policy also enables estimation of prompt tokens in advance by API Management, minimizing unnecessary requests to the LLM backend if the limit is already exceeded. However, because the actual number of tokens consumed depends on both the prompt size and the completion size (which varies by request), the policy can't predict total token consumption in advance. This design could allow token limits to be exceeded temporarily when multiple requests are processed concurrently.
 
 [!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]