notes

dlepow · dlepow · commit e7bb8a9d78e1 · 2025-11-24T09:22:26.000-08:00
diff --git a/articles/api-management/azure-openai-token-limit-policy.md b/articles/api-management/azure-openai-token-limit-policy.md
@@ -72,7 +72,7 @@ By relying on token usage metrics returned from the Azure OpenAI endpoint, the p
 * This policy can be used multiple times per policy definition
 * This policy can optionally be configured when adding an Azure OpenAI API using the portal.
 * Where available when `estimate-prompt-tokens` is set to `false`, values in the usage section of the response from the Azure OpenAI API are used to determine token usage.
-* When multiple requests are sent concurrently or with slight delays, the policy can allow more token consumption than the configured limit before enforcement occurs. This happens because the policy can't determine the exact number of tokens consumed until responses are received from the backend. Once responses are processed and token limits are exceeded, subsequent requests are blocked until the limit resets. 
+* When multiple requests are sent concurrently or with slight delays, the policy can allow token consumption that exceeds the configured limit. This happens because the policy can't determine the exact number of tokens consumed until responses are received from the backend. Once responses are processed and token limits are exceeded, subsequent requests are blocked until the limit resets. 
 * Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute. Completion tokens are also estimated when responses are streamed.
 * The value of `remaining-quota-tokens-variable-name` or `remaining-quota-tokens-header-name` is an estimate for informational purposes but could be larger than expected based on actual token consumption. The value is more accurate as the quota is approached.
 * For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
diff --git a/articles/api-management/llm-token-limit-policy.md b/articles/api-management/llm-token-limit-policy.md
@@ -71,7 +71,7 @@ By relying on token usage metrics returned from the LLM endpoint, the policy mon
 * This policy can be used multiple times per policy definition.
 * This policy can optionally be configured when adding an LLM API using the portal.
 * Where available when `estimate-prompt-tokens` is set to `false`, values in the usage section of the response from the LLM API are used to determine token usage.
-* When multiple requests are sent concurrently or with slight delays, the policy can allow more token consumption than the configured limit before enforcement occurs. This happens because the policy can't determine the exact number of tokens consumed until responses are received from the backend. Once responses are processed and token limits are exceeded, subsequent requests are blocked until the limit resets. 
+* When multiple requests are sent concurrently or with slight delays, the policy can allow token consumption that exceeds the configured limit. This happens because the policy can't determine the exact number of tokens consumed until responses are received from the backend. Once responses are processed and token limits are exceeded, subsequent requests are blocked until the limit resets. 
 * Certain LLM endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute.
 * The value of `remaining-quota-tokens-variable-name` or `remaining-quota-tokens-header-name` is an estimate for informational purposes but could be larger than expected based on actual token consumption. The value is more accurate as the quota is approached.
 * For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.