Skip to content

Commit 2912e5d

Browse files
Merge pull request #54009 from sherzyang/main
Add changes.
2 parents 8a4b32e + 058a879 commit 2912e5d

1 file changed

Lines changed: 5 additions & 2 deletions

File tree

learn-pr/wwl-data-ai/get-started-with-generative-ai-and-agents/includes/2-generative-ai-models.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,10 @@ A common way you can evaluate is to start in Foundry's model catalog, choose a m
8888

8989
There are various ways to score a model in Foundry portal, including *Natural Language Processing (NLP) metrics* and *AI‑assisted quality metrics*. Examples of classic *NLP quality metrics* are: accuracy, precision, recall, and F1. Examples of *AI‑assisted metrics* include groundedness, relevance, coherence and fluency, and GPT similarity. Choose AI-assisted metrics for qualitative scoring beyond traditional metrics.
9090

91-
Safety evaluators can be used help ensure responsible AI output. They scan for harmful or unsafe content, bias and unfairness, violence, self‑harm, or protected‑class harms. Foundry's Evaluator Library offers reusable evaluators for quality scoring, safety scanning, and more.
91+
In Foundry, **evaluators** are components used to measure the quality, safety, and effectiveness of AI model or agent outputs. For example, safety evaluators can be used help ensure responsible AI output. They scan for harmful or unsafe content, bias and unfairness, violence, self‑harm, or protected‑class harms. Foundry's Evaluator Library offers reusable evaluators for quality scoring, safety scanning, and more.
92+
93+
>[!NOTE]
94+
>On their own, Foundry's evaluators detect, scan, and score issues but do not actively resolve them.
9295
9396
## Deploy models in Foundry
9497

@@ -103,7 +106,7 @@ Deployment parameters that you can customize in Foundry include:
103106
> [!NOTE]
104107
> A **token** is the smallest unit of text or data that a generative AI model can process. Models break input into tokens—such as words, subwords, characters, or punctuation—so they can understand and generate language efficiently.
105108
106-
When you deploy a model, you can assign it a *Tokens Per Minute* (TPM) allocation. TPM determines the speed and scale the model can process inputs and the rate‑limit boundaries such as requests per minute (RPM).
109+
When you deploy a model, you can assign it a *Tokens Per Minute* (TPM) allocation. TPM determines the speed and scale the model can process inputs and the rate‑limit boundaries such as requests per minute (RPM). When you assign a higher TPM allocation to a model deployment, you're increasing its capacity to handle token traffic per minute. Lower TPM reduces how fast your deployment is allowed to consume tokens across requests.
107110

108111
Limits differ by model family, for example:
109112
- High‑end reasoning models (for example: DeepSeek R1, Grok, large Llama versions) may have high TPM ceilings.

0 commit comments

Comments
 (0)