Skip to content

Commit 249a50d

Browse files
author
Sherry Yang
committed
Update.
1 parent 8a4b32e commit 249a50d

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

learn-pr/wwl-data-ai/get-started-with-generative-ai-and-agents/includes/2-generative-ai-models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ Deployment parameters that you can customize in Foundry include:
103103
> [!NOTE]
104104
> A **token** is the smallest unit of text or data that a generative AI model can process. Models break input into tokens—such as words, subwords, characters, or punctuation—so they can understand and generate language efficiently.
105105
106-
When you deploy a model, you can assign it a *Tokens Per Minute* (TPM) allocation. TPM determines the speed and scale the model can process inputs and the rate‑limit boundaries such as requests per minute (RPM).
106+
When you deploy a model, you can assign it a *Tokens Per Minute* (TPM) allocation. TPM determines the speed and scale the model can process inputs and the rate‑limit boundaries such as requests per minute (RPM). When you assign a higher TPM allocation to a model deployment, you're increasing its capacity to handle token traffic per minute. Lower TPM reduces how fast your deployment is allowed to consume tokens across requests.
107107

108108
Limits differ by model family, for example:
109109
- High‑end reasoning models (for example: DeepSeek R1, Grok, large Llama versions) may have high TPM ceilings.

0 commit comments

Comments
 (0)