review comments

gitName · gitName · commit ad65264a51a5 · 2025-08-20T08:55:01.000-07:00
diff --git a/articles/api-management/azure-openai-token-limit-policy.md b/articles/api-management/azure-openai-token-limit-policy.md
@@ -73,7 +73,7 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
 * This policy can optionally be configured when adding an API from the Azure OpenAI using the portal.
 * Where available when `estimate-prompt-tokens` is set to `false`, values in the usage section of the response from the Azure OpenAI API are used to determine token usage.
 * Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute. Completion tokens are also estimated when responses are streamed.
-* The value of `remaining-quota-tokens-variable-name` or `remaining-quota-tokens-header-name` is estimated for informational purposes but can vary from the actual remaining quota tokens.
+* The value of `remaining-quota-tokens-variable-name` or `remaining-quota-tokens-header-name` is an estimate for informational purposes but could be larger than expected based on actual token consumption. The value is more accurate as the quota is approached.
 * For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
 * [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
 * [!INCLUDE [api-management-token-limit-gateway-counts](../../includes/api-management-token-limit-gateway-counts.md)]
diff --git a/articles/api-management/llm-token-limit-policy.md b/articles/api-management/llm-token-limit-policy.md
@@ -71,7 +71,7 @@ By relying on token usage metrics returned from the LLM endpoint, the policy can
 * This policy can be used multiple times per policy definition.
 * Where available when `estimate-prompt-tokens` is set to `false`, values in the usage section of the response from the LLM API are used to determine token usage.
 * Certain LLM endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute.
-* The value of `remaining-quota-tokens-variable-name` or `remaining-quota-tokens-header-name` is estimated for informational purposes but can vary from the actual remaining quota tokens.
+* The value of `remaining-quota-tokens-variable-name` or `remaining-quota-tokens-header-name` is an estimate for informational purposes but could be larger than expected based on actual token consumption. The value is more accurate as the quota is approached.
 * For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
 * [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
 * [!INCLUDE [api-management-token-limit-gateway-counts](../../includes/api-management-token-limit-gateway-counts.md)]