Skip to content

Commit f37f9c4

Browse files
author
Sherry Yang
committed
Update for acrolinx.
1 parent 4da7112 commit f37f9c4

3 files changed

Lines changed: 29 additions & 30 deletions

File tree

learn-pr/wwl-data-ai/get-started-with-generative-ai-and-agents/includes/2-generative-ai-models.md

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,20 @@ Generative AI and agentic solutions are based on language models. Large language
33
**Microsoft Foundry** provides an integrated environment for discovering, evaluating, deploying, and operating generative AI models. It brings together a rich model catalog, flexible deployment options, and built‑in governance capabilities so teams can build copilots, agents, and AI-powered applications with enterprise confidence.
44

55
> [!NOTE]
6-
> In order to use Microsoft Foundry, you need an Azure subscription. To utilize Foundry's capabilities, you will need to start by creating a project in Foundry. For more information, review [Get started in Microsoft Foundry](/training/modules/get-started-ai-in-foundry/).
6+
> In order to use Microsoft Foundry, you need an Azure subscription. To utilize Foundry's capabilities, start by creating a project in Foundry. For more information, review [Get started in Microsoft Foundry](/training/modules/get-started-ai-in-foundry/).
77
88
## Discover models in Foundry's model catalog
99

1010
**Foundry's model catalog** is a central hub for discovering and using a wide selection of generative AI models from an extensive range of providers. In Foundry, you can filter models by source, capabilities, inference tasks, and more. Foundry enables you to understand and compare model capabilities, as well as test and build scalable, secure, responsible AI solutions.
1111

1212
> ![NOTE]
13-
> The Foundry portal has a *classic* user interface (UI) and a *new* UI. Images of the Foundry portal reflect the *new* UI where it is relevant.
13+
> The Foundry portal has a *classic* user interface (UI) and a *new* UI. Images of the Foundry portal reflect the *new* UI where it's relevant.
1414
1515
![Screenshot of Foundry's model catalog with the new UI.](../media/model-catalog-1.png)
1616

1717
The model catalog offers a broad selection of models including models sold directly by Azure alongside models from partners and open-source communities.
1818

19-
- **Models Sold Directly by Azure**: These models are hosted and supported by Microsoft under Microsoft Product Terms.
20-
They offer high levels of integration with Azure, enterprise-grade SLAs, pre‑configured security and compliance alignment. Examples often include Microsoft-hosted OpenAI and other partner models made available with provisioning controls.
19+
- **Models Sold Directly by Azure**: These models are hosted by Microsoft under Microsoft Product Terms. They offer high levels of integration with Azure, enterprise-grade service level agreements (SLAs), preconfigured security, and compliance alignment.
2120

2221
- **Models from Partners and the Community**: Includes open-source or vendor-hosted models integrated through the catalog. These models support broader experimentation and rapid innovation and are often suitable for specialized or domain‑specific tasks.
2322

@@ -31,18 +30,18 @@ Each model entry typically includes:
3130

3231
#### Commonly used model families
3332

34-
Among the thousand-plus models available in Foundry, there are many grouped by **model family**. a model family refers to a group of related models that share the same underlying architecture or lineage, but differ in size, capability, specialization, or version.
33+
Among the thousand-plus models available in Foundry, there are many grouped by **model family**. A model family refers to a group of related models that share the same underlying architecture or lineage, but differ in size, capability, specialization, or version.
3534

3635
Commonly used model families include:
3736

3837
- **GPT‑5.x**: Optimized for multi‑step reasoning, structured logic, planning, and agentic workflows. It does well in scenarios needing high‑accuracy reasoning and long‑context understanding—such as generating technical reports, code analysis, or orchestrating multi‑tool agents. It supports adjustable "thinking levels", letting developers trade speed for accuracy when needed.
3938

4039
- **Claude Opus 4.5** (Anthropic): When you need a frontier‑level model for sophisticated agents, complex code reasoning, or multi‑step computer‑use tasks. Opus 4.5 is described as Anthropic’s most intelligent model with strong performance across coding, agents, and computer use, and large context/output windows—useful for long specifications, multi-file diffs, or extended research notes.
4140

42-
- **Mistral Large 3** (Mistral AI): is a state‑of‑the‑art, general‑purpose model ideal for where you want strong quality with efficient throughput—e.g., multilingual drafting, structured business report generation, or mid‑latency agent tasks that balance cost and performance. Mistral Large 3 is a state‑of‑the‑art general model and part of the curated Foundry catalog, making it a practical alternative to flagship models when you want high capability with flexible cost/latency trade‑offs.
41+
- **Mistral Large 3** (Mistral AI): is a state‑of‑the‑art, general‑purpose model ideal for where you want strong quality with efficient throughput. The model does well with multilingual drafting, structured business report generation, or mid‑latency agent tasks that balance cost and performance. Mistral Large 3 is a "state‑of‑the‑art" general model and part of the curated Foundry catalog, making it a practical alternative to flagship models when you want high capability with flexible cost/latency trade‑offs.
4342

4443
>[!NOTE]
45-
> Registration is currently required for the GPT-5 model family, restricting its availability. All Foundry users can use **GPT‑4.1**, which is ideal for real‑time chat, customer support, and interactive applications that must respond quickly and at scale. It is optimized for speed, efficiency, and low‑latency inference, making it better than reasoning‑heavy models for high‑volume production workloads.
44+
> Registration is currently required for the GPT-5 model family, restricting its availability. All Foundry users can use **GPT‑4.1**, which is ideal for real‑time chat, customer support, and interactive applications that must respond quickly and at scale. It's optimized for speed, efficiency, and low‑latency inference, making it better than reasoning‑heavy models for high‑volume production workloads.
4645
4746
In Foundry, **foundation models** are large, pretrained models—such as GPT, Claude, Mistral, and others—that provide general language, reasoning, or multimodal capabilities out of the box. These models can be deployed immediately or customized through fine‑tuning, and serve as the base layer for building AI applications.
4847

@@ -52,11 +51,11 @@ Choosing the right model in Foundry starts with understanding **your workload, t
5251

5352
#### Select a model by task type
5453

55-
- **Chat and text generation:** GPT‑5.x, Claude, DeepSeek V3.1, small models like Phi‑4 or Llama SLMs.
56-
- **Reasoning-intensive tasks:** GPT‑5.x “thinking modes,” Claude‑Opus class models (optimized for step-by-step reasoning).
54+
- **Chat and text generation:** GPT‑5.x, Claude, DeepSeek V3.1, small language models (SLMs) like Phi‑4 or Llama.
55+
- **Reasoning-intensive tasks:** GPT‑5.x, Claude‑Opus class models (optimized for step-by-step reasoning).
5756
- **Coding:** GPT‑5.1‑codex, Claude‑Sonnet for complex agent flows.
5857
- **Embeddings / retrieval:** Specialized embedding models (OpenAI, Microsoft, Cohere).
59-
- **Multimodal (image+text):** GPT‑4o, DeepSeek‑V3.1 multimodal, diffusion models like Black Forest Labs Flux for image generation.
58+
- **Multimodal (image+text):** GPT‑4o, DeepSeek‑V3.1 multimodal, diffusion models like Black Forest Labs' Flux for image generation.
6059
- **Industry or domain-specific tasks:** Domain-tuned models in the catalog (finance, healthcare, legal).
6160

6261
|**Task**|**Recommended model types**|**When to choose**|
@@ -74,43 +73,43 @@ Choosing the right model in Foundry starts with understanding **your workload, t
7473
Foundry's model catalog includes benchmarking results that show how models perform on standard datasets. Benchmark scores simplify model selection by using consistent evaluation criteria.
7574

7675
Through the Foundry portal, you can also view:
77-
- **Model leaderboards**: leaderboards rank models based on attributes like quality, safety, and throughput. This helps identify the best model for a task (e.g., reasoning, summarization, code generation).
76+
- **Model leaderboards**: leaderboards rank models based on attributes like quality, safety, and throughput. This helps identify the best model for a task. Examples of tasks include reasoning, summarization, code generation.
7877
- **Comparisons and filters**: Side‑by‑side model comparison by quality and accuracy, cost, security and compliance, and performance metrics. You can filter by industry, use case, model type, licensing, and more.
7978

8079
![Screenshot of Foundry's model leaderboard and side-by-side comparisons.](../media/model-leaderboard.png)
8180

8281
A common way you can evaluate is to start in Foundry's model catalog, choose a model, then select *Benchmarks → Try with your own data*. You can try out prompts and see if the responses are as expected.
8382

84-
There are a variety of ways to score a model in Foundry portal, including *Natural Language Processing (NLP) metrics* and *AI‑assisted quality metrics*. Examples of classic *NLP quality metrics* are: accuracy, precision, recall, and F1. Examples of *AI‑assisted metrics* include groundedness, relevance, coherence and fluency, and GPT similarity. Choose AI-assisted metrics for qualitative scoring beyond traditional metrics.
83+
There are various ways to score a model in Foundry portal, including *Natural Language Processing (NLP) metrics* and *AI‑assisted quality metrics*. Examples of classic *NLP quality metrics* are: accuracy, precision, recall, and F1. Examples of *AI‑assisted metrics* include groundedness, relevance, coherence and fluency, and GPT similarity. Choose AI-assisted metrics for qualitative scoring beyond traditional metrics.
8584

8685
Safety evaluators can be used help ensure responsible AI output. They scan for harmful or unsafe content, bias and unfairness, violence, self‑harm, or protected‑class harms. Foundry's Evaluator Library offers reusable evaluators for quality scoring, safety scanning, and more.
8786

8887
## Deploy models in Foundry
8988

90-
Once you select a model, Foundry provides flexible deployment mechanisms that let you tailor performance, cost, and governance. To **deploy a model** means taking an AI model and making it available for use in production through a stable, scalable, and secure endpoint. Deployment of a configured model turns the model into a service that applications can call—usually through an API. Deploying a configured model helps ensure consistent performance and reliability. It also allows developers to prevent unauthorized or unsafe use.
89+
Once you select a model, Foundry provides flexible deployment mechanisms that let you tailor performance, cost, and governance. **Deploying a model** takes an AI model and makes it available for use in production through a stable, scalable, and secure endpoint. Deployment of a configured model turns the model into a service that applications can call—usually through an API. Deploying a configured model helps ensure consistent performance and reliability. It also allows developers to prevent unauthorized or unsafe use.
9190

9291
Deployment parameters that you can customize in Foundry include:
9392

94-
- **Deployment type**: such as standard, global batch, and regional provisioned throughput, determine where and how inference is processed in Foundry. These are tied to throughput and data‑processing requirements.
93+
- **Deployment type**: such as standard, global batch, and regional provisioned throughput, determine where and how inference is processed in Foundry. Deployment types are tied to throughput and data‑processing requirements.
9594
- **Model version**
9695
- **Tokens per minute (TPM)** rate limit
9796

9897
> [!NOTE]
9998
> A **token** is the smallest unit of text or data that a generative AI model can process. Models break input into tokens—such as words, subwords, characters, or punctuation—so they can understand and generate language efficiently.
10099
101-
When you deploy a model you can assign it a *Tokens Per Minute* (TPM) allocation. TPM determines the speed and scale the model can process inputs and the rate‑limit boundaries such as requests per minute (RPM).
100+
When you deploy a model, you can assign it a *Tokens Per Minute* (TPM) allocation. TPM determines the speed and scale the model can process inputs and the rate‑limit boundaries such as requests per minute (RPM).
102101

103102
Limits differ by model family, for example:
104-
- High‑end reasoning models (e.g., DeepSeek R1, Grok, large Llama versions) may have high TPM ceilings.
103+
- High‑end reasoning models (for example: DeepSeek R1, Grok, large Llama versions) may have high TPM ceilings.
105104
- Specialized or image models often operate under capacity units instead of TPM.
106105

107-
*Throttling*, in a compute context, means intentionally slowing down or limiting how much compute work can happen at one time. Its a protective mechanism used when a system is close to hitting its processing limits. Instead of letting workloads overwhelm the system (which could cause failures or severe slowdowns), throttling temporarily restricts resource usage so the system can remain stable and responsive.
106+
*Throttling*, in a compute context, means intentionally slowing down or limiting how much compute work can happen at one time. It's a protective mechanism used when a system is close to hitting its processing limits. Throttling temporarily restricts resource usage so the system can remain stable and responsive.
108107

109108
Deployment‑level quotas define how many tokens or requests can be processed before throttling occurs. Larger prompts and higher max output token settings consume more TPM, leading to rate-limit errors if exceeded (covered in throttling description search results). If you see throttling, lower **max tokens** or reduce concurrent requests in code.
110109

111110
When you deploy a model in Foundry, several things occur:
112111
- Compute resources are allocated: Foundry assigns the hardware needed to run the model—CPUs, GPUs, memory, networking, and scaling rules.
113-
- An API endpoint is created: You are able to securely invoke the model through the OpenAI Responses API, validated through management API checks.
112+
- An API endpoint is created: You're able to securely invoke the model through the OpenAI Responses API, validated through management API checks.
114113
- Configuration (such as model version, response style, safety settings) is locked in
115114
- Monitoring and logging become active: usage metrics, performance, latency, errors, and costs are tracked
116115

learn-pr/wwl-data-ai/get-started-with-generative-ai-and-agents/includes/3-using-generative-ai-models.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@ The easiest way to interact with a deployed model is to use the model playground
44

55
## Key configuration parameters
66

7-
Several *model arguments* or *parameters* influence runtime behavior, performance, and cost. In the playground, you can configure settings such as what deployed model you are testing, **temperature**, **max output tokens**, and **system instructions**. Here, you can experiment with giving the model instructions on how to respond, which we sometimes call system prompts, and you can submit prompts in the chat interface and see the responses generated by the model.
7+
Several *model arguments* or *parameters* influence runtime behavior, performance, and cost. In the playground settings, you can configure parameters such as **temperature**, **max output tokens**, and **system instructions**. In the playground chat interface you can submit prompts and see the responses generated by the model.
88

99
- **Temperature**: controls creativity vs. determinism.
1010
- **Max output tokens** – caps response length; affects token consumption and throttling behavior.
1111
- **System instructions** – sets behavior and role of the model.
1212

13-
Unlike the user prompt, which is the end-user request or question (example: Where should I travel?), a **System prompt** sets behavior, tone, tools, and guardrails for the assistant. An example of a system prompt is: "You are a helpful, step‑by‑step tutor. Cite sources. Decline medical advice".
13+
Unlike the user prompt, which is the end-user request or question (example: Where should I travel?), a **System prompt** sets behavior, tone, tools, and guardrails for the assistant. An example of a system prompt is: "You are a helpful, step‑by‑step tutor. Cite sources. Decline medical advice."
1414

1515
The playground is a useful bridge between Foundry and code. After you test representative prompts, you can use the same system and user prompts and parameter values in your code. The playground provides code that can call your Foundry deployment via the OpenAI‑compatible *Responses* API. The code is essentially what is running when you use the chat interface to configure settings and send user prompts.
1616

@@ -31,10 +31,10 @@ For Foundry, a lightweight chat client is often a **single Python file** that co
3131

3232
#### Build a Python chat client
3333

34-
After you have created a **Foundry project** and **deployed a chat model** (for example, `gpt-4.1`), you can use the Foundry SDK. In the example below, the client application uses authentication to connect to the endpoint for the model, submit a prompt, and display the response.
34+
After you created a **Foundry project** and **deployed a chat model** (for example, `gpt-4.1`), you can use the Foundry SDK. In the example, the client application uses authentication to connect to the endpoint for the model, submit a prompt, and display the response.
3535

3636
>[!NOTE]
37-
>In order to use the SDK, you will need your `azure-ai-projects` is the core Azure AI Projects (Foundry) SDK used to connect to your Foundry project and obtain an OpenAI-compatible client.
37+
>In order to use the SDK, you need to install the `azure-ai-projects` package. The package is the core Azure AI Projects (Foundry) SDK used to connect to your Foundry project and obtain an OpenAI-compatible client.
3838
3939
```python
4040
# pip install openai>=1.3.0
@@ -50,7 +50,7 @@ client = OpenAI(
5050

5151
response = client.responses.create(
5252
model=os.environ["DEPLOYMENT_NAME"], # e.g., "gpt-4o-mini"
53-
input=[{"role": "system", "content": "You are a helpful assistant."},
53+
input=[{"role": "system", "content": "You're a helpful assistant."},
5454
{"role": "user", "content": "Summarize the key points from our release notes in 3 bullets."}],
5555
max_output_tokens=300,
5656
temperature=0.7
@@ -67,7 +67,7 @@ In Microsoft Foundry, **generative AI models** and **agents** are related but se
6767
- **Agents = packaged, task‑oriented workers built on top of that intelligence**
6868

6969
When you use a generative AI model on its own:
70-
- You want pure inference: Take this prompt and generate output.
70+
- You want pure inference: "Take this prompt and generate output."
7171
- You’re experimenting in the Playground
7272
- You call the model via the **OpenAI Responses API**
7373

0 commit comments

Comments
 (0)