Skip to content

Commit 271dcef

Browse files
author
gitName
committed
edit
1 parent 3ac4e1f commit 271dcef

2 files changed

Lines changed: 2 additions & 2 deletions

File tree

articles/api-management/azure-openai-token-limit-policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ ms.author: danlep
2020

2121
The `azure-openai-token-limit` policy prevents Azure OpenAI in Foundry Models API usage spikes on a per key basis by limiting consumption of language model tokens to a specified rate (number per minute), a quota over a specified period, or both. When a specified token rate limit is exceeded, the caller receives a `429 Too Many Requests` response status code. When a specified quota is exceeded, the caller receives a `403 Forbidden` response status code.
2222

23-
By relying on token usage metrics returned from the Azure OpenAI endpoint, the policy monitors and enforces limits based on actual token consumption. The policy also enables estimation of prompt tokens in advance by API Management, minimizing unnecessary requests to the Azure OpenAI backend if the limit is already exceeded. However, because the actual number of tokens consumed depends on both the prompt size and the completion size (which varies by request), the policy can't predict total token consumption in advance. This affects how limits are enforced when multiple requests are processed concurrently.
23+
By relying on token usage metrics returned from the Azure OpenAI endpoint, the policy monitors and enforces limits based on actual token consumption. The policy also enables estimation of prompt tokens in advance by API Management, minimizing unnecessary requests to the Azure OpenAI backend if the limit is already exceeded. However, because the actual number of tokens consumed depends on both the prompt size and the completion size (which varies by request), the policy can't predict total token consumption in advance. This design could allow token limits to be exceeded temporarily when multiple requests are processed concurrently.
2424

2525
[!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
2626

articles/api-management/llm-token-limit-policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ms.author: danlep
1919

2020
The `llm-token-limit` policy prevents large language model (LLM) API usage spikes on a per key basis by limiting consumption of language model tokens to either a specified rate (number per minute), a quota over a specified period, or both. When a specified token rate limit is exceeded, the caller receives a `429 Too Many Requests` response status code. When a specified quota is exceeded, the caller receives a `403 Forbidden` response status code.
2121

22-
By relying on token usage metrics returned from the LLM endpoint, the policy monitors and enforces limits based on actual token consumption. The policy also enables estimation of prompt tokens in advance by API Management, minimizing unnecessary requests to the LLM backend if the limit is already exceeded. However, because the actual number of tokens consumed depends on both the prompt size and the completion size (which varies by request), the policy can't predict total token consumption in advance. This affects how limits are enforced when multiple requests are processed concurrently.
22+
By relying on token usage metrics returned from the LLM endpoint, the policy monitors and enforces limits based on actual token consumption. The policy also enables estimation of prompt tokens in advance by API Management, minimizing unnecessary requests to the LLM backend if the limit is already exceeded. However, because the actual number of tokens consumed depends on both the prompt size and the completion size (which varies by request), the policy can't predict total token consumption in advance. This design could allow token limits to be exceeded temporarily when multiple requests are processed concurrently.
2323

2424
[!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
2525

0 commit comments

Comments
 (0)