Skip to content

Commit ad65264

Browse files
author
gitName
committed
review comments
1 parent e2e5c0f commit ad65264

2 files changed

Lines changed: 2 additions & 2 deletions

File tree

articles/api-management/azure-openai-token-limit-policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
7373
* This policy can optionally be configured when adding an API from the Azure OpenAI using the portal.
7474
* Where available when `estimate-prompt-tokens` is set to `false`, values in the usage section of the response from the Azure OpenAI API are used to determine token usage.
7575
* Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute. Completion tokens are also estimated when responses are streamed.
76-
* The value of `remaining-quota-tokens-variable-name` or `remaining-quota-tokens-header-name` is estimated for informational purposes but can vary from the actual remaining quota tokens.
76+
* The value of `remaining-quota-tokens-variable-name` or `remaining-quota-tokens-header-name` is an estimate for informational purposes but could be larger than expected based on actual token consumption. The value is more accurate as the quota is approached.
7777
* For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
7878
* [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
7979
* [!INCLUDE [api-management-token-limit-gateway-counts](../../includes/api-management-token-limit-gateway-counts.md)]

articles/api-management/llm-token-limit-policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ By relying on token usage metrics returned from the LLM endpoint, the policy can
7171
* This policy can be used multiple times per policy definition.
7272
* Where available when `estimate-prompt-tokens` is set to `false`, values in the usage section of the response from the LLM API are used to determine token usage.
7373
* Certain LLM endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute.
74-
* The value of `remaining-quota-tokens-variable-name` or `remaining-quota-tokens-header-name` is estimated for informational purposes but can vary from the actual remaining quota tokens.
74+
* The value of `remaining-quota-tokens-variable-name` or `remaining-quota-tokens-header-name` is an estimate for informational purposes but could be larger than expected based on actual token consumption. The value is more accurate as the quota is approached.
7575
* For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
7676
* [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
7777
* [!INCLUDE [api-management-token-limit-gateway-counts](../../includes/api-management-token-limit-gateway-counts.md)]

0 commit comments

Comments
 (0)