You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/get-started-with-generative-ai-and-agents/includes/2-generative-ai-models.md
+17-18Lines changed: 17 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,21 +3,20 @@ Generative AI and agentic solutions are based on language models. Large language
3
3
**Microsoft Foundry** provides an integrated environment for discovering, evaluating, deploying, and operating generative AI models. It brings together a rich model catalog, flexible deployment options, and built‑in governance capabilities so teams can build copilots, agents, and AI-powered applications with enterprise confidence.
4
4
5
5
> [!NOTE]
6
-
> In order to use Microsoft Foundry, you need an Azure subscription. To utilize Foundry's capabilities, you will need to start by creating a project in Foundry. For more information, review [Get started in Microsoft Foundry](/training/modules/get-started-ai-in-foundry/).
6
+
> In order to use Microsoft Foundry, you need an Azure subscription. To utilize Foundry's capabilities, start by creating a project in Foundry. For more information, review [Get started in Microsoft Foundry](/training/modules/get-started-ai-in-foundry/).
7
7
8
8
## Discover models in Foundry's model catalog
9
9
10
10
**Foundry's model catalog** is a central hub for discovering and using a wide selection of generative AI models from an extensive range of providers. In Foundry, you can filter models by source, capabilities, inference tasks, and more. Foundry enables you to understand and compare model capabilities, as well as test and build scalable, secure, responsible AI solutions.
11
11
12
12
> ![NOTE]
13
-
> The Foundry portal has a *classic* user interface (UI) and a *new* UI. Images of the Foundry portal reflect the *new* UI where it is relevant.
13
+
> The Foundry portal has a *classic* user interface (UI) and a *new* UI. Images of the Foundry portal reflect the *new* UI where it's relevant.
14
14
15
15

16
16
17
17
The model catalog offers a broad selection of models including models sold directly by Azure alongside models from partners and open-source communities.
18
18
19
-
-**Models Sold Directly by Azure**: These models are hosted and supported by Microsoft under Microsoft Product Terms.
20
-
They offer high levels of integration with Azure, enterprise-grade SLAs, pre‑configured security and compliance alignment. Examples often include Microsoft-hosted OpenAI and other partner models made available with provisioning controls.
19
+
-**Models Sold Directly by Azure**: These models are hosted by Microsoft under Microsoft Product Terms. They offer high levels of integration with Azure, enterprise-grade service level agreements (SLAs), preconfigured security, and compliance alignment.
21
20
22
21
-**Models from Partners and the Community**: Includes open-source or vendor-hosted models integrated through the catalog. These models support broader experimentation and rapid innovation and are often suitable for specialized or domain‑specific tasks.
23
22
@@ -31,18 +30,18 @@ Each model entry typically includes:
31
30
32
31
#### Commonly used model families
33
32
34
-
Among the thousand-plus models available in Foundry, there are many grouped by **model family**. a model family refers to a group of related models that share the same underlying architecture or lineage, but differ in size, capability, specialization, or version.
33
+
Among the thousand-plus models available in Foundry, there are many grouped by **model family**. A model family refers to a group of related models that share the same underlying architecture or lineage, but differ in size, capability, specialization, or version.
35
34
36
35
Commonly used model families include:
37
36
38
37
-**GPT‑5.x**: Optimized for multi‑step reasoning, structured logic, planning, and agentic workflows. It does well in scenarios needing high‑accuracy reasoning and long‑context understanding—such as generating technical reports, code analysis, or orchestrating multi‑tool agents. It supports adjustable "thinking levels", letting developers trade speed for accuracy when needed.
39
38
40
39
-**Claude Opus 4.5** (Anthropic): When you need a frontier‑level model for sophisticated agents, complex code reasoning, or multi‑step computer‑use tasks. Opus 4.5 is described as Anthropic’s most intelligent model with strong performance across coding, agents, and computer use, and large context/output windows—useful for long specifications, multi-file diffs, or extended research notes.
41
40
42
-
-**Mistral Large 3** (Mistral AI): is a state‑of‑the‑art, general‑purpose model ideal for where you want strong quality with efficient throughput—e.g., multilingual drafting, structured business report generation, or mid‑latency agent tasks that balance cost and performance. Mistral Large 3 is a “state‑of‑the‑art” general model and part of the curated Foundry catalog, making it a practical alternative to flagship models when you want high capability with flexible cost/latency trade‑offs.
41
+
-**Mistral Large 3** (Mistral AI): is a state‑of‑the‑art, general‑purpose model ideal for where you want strong quality with efficient throughput. The model does well with multilingual drafting, structured business report generation, or mid‑latency agent tasks that balance cost and performance. Mistral Large 3 is a "state‑of‑the‑art" general model and part of the curated Foundry catalog, making it a practical alternative to flagship models when you want high capability with flexible cost/latency trade‑offs.
43
42
44
43
>[!NOTE]
45
-
> Registration is currently required for the GPT-5 model family, restricting its availability. All Foundry users can use **GPT‑4.1**, which is ideal for real‑time chat, customer support, and interactive applications that must respond quickly and at scale. It is optimized for speed, efficiency, and low‑latency inference, making it better than reasoning‑heavy models for high‑volume production workloads.
44
+
> Registration is currently required for the GPT-5 model family, restricting its availability. All Foundry users can use **GPT‑4.1**, which is ideal for real‑time chat, customer support, and interactive applications that must respond quickly and at scale. It's optimized for speed, efficiency, and low‑latency inference, making it better than reasoning‑heavy models for high‑volume production workloads.
46
45
47
46
In Foundry, **foundation models** are large, pretrained models—such as GPT, Claude, Mistral, and others—that provide general language, reasoning, or multimodal capabilities out of the box. These models can be deployed immediately or customized through fine‑tuning, and serve as the base layer for building AI applications.
48
47
@@ -52,11 +51,11 @@ Choosing the right model in Foundry starts with understanding **your workload, t
52
51
53
52
#### Select a model by task type
54
53
55
-
-**Chat and text generation:** GPT‑5.x, Claude, DeepSeek V3.1, small models like Phi‑4 or Llama SLMs.
56
-
-**Reasoning-intensive tasks:** GPT‑5.x “thinking modes,” Claude‑Opus class models (optimized for step-by-step reasoning).
54
+
-**Chat and text generation:** GPT‑5.x, Claude, DeepSeek V3.1, small language models (SLMs) like Phi‑4 or Llama.
55
+
-**Reasoning-intensive tasks:** GPT‑5.x, Claude‑Opus class models (optimized for step-by-step reasoning).
57
56
-**Coding:** GPT‑5.1‑codex, Claude‑Sonnet for complex agent flows.
-**Multimodal (image+text):** GPT‑4o, DeepSeek‑V3.1 multimodal, diffusion models like Black Forest Labs’ Flux for image generation.
58
+
-**Multimodal (image+text):** GPT‑4o, DeepSeek‑V3.1 multimodal, diffusion models like Black Forest Labs' Flux for image generation.
60
59
-**Industry or domain-specific tasks:** Domain-tuned models in the catalog (finance, healthcare, legal).
61
60
62
61
|**Task**|**Recommended model types**|**When to choose**|
@@ -74,43 +73,43 @@ Choosing the right model in Foundry starts with understanding **your workload, t
74
73
Foundry's model catalog includes benchmarking results that show how models perform on standard datasets. Benchmark scores simplify model selection by using consistent evaluation criteria.
75
74
76
75
Through the Foundry portal, you can also view:
77
-
-**Model leaderboards**: leaderboards rank models based on attributes like quality, safety, and throughput. This helps identify the best model for a task (e.g., reasoning, summarization, code generation).
76
+
-**Model leaderboards**: leaderboards rank models based on attributes like quality, safety, and throughput. This helps identify the best model for a task. Examples of tasks include reasoning, summarization, code generation.
78
77
-**Comparisons and filters**: Side‑by‑side model comparison by quality and accuracy, cost, security and compliance, and performance metrics. You can filter by industry, use case, model type, licensing, and more.
79
78
80
79

81
80
82
81
A common way you can evaluate is to start in Foundry's model catalog, choose a model, then select *Benchmarks → Try with your own data*. You can try out prompts and see if the responses are as expected.
83
82
84
-
There are a variety of ways to score a model in Foundry portal, including *Natural Language Processing (NLP) metrics* and *AI‑assisted quality metrics*. Examples of classic *NLP quality metrics* are: accuracy, precision, recall, and F1. Examples of *AI‑assisted metrics* include groundedness, relevance, coherence and fluency, and GPT similarity. Choose AI-assisted metrics for qualitative scoring beyond traditional metrics.
83
+
There are various ways to score a model in Foundry portal, including *Natural Language Processing (NLP) metrics* and *AI‑assisted quality metrics*. Examples of classic *NLP quality metrics* are: accuracy, precision, recall, and F1. Examples of *AI‑assisted metrics* include groundedness, relevance, coherence and fluency, and GPT similarity. Choose AI-assisted metrics for qualitative scoring beyond traditional metrics.
85
84
86
85
Safety evaluators can be used help ensure responsible AI output. They scan for harmful or unsafe content, bias and unfairness, violence, self‑harm, or protected‑class harms. Foundry's Evaluator Library offers reusable evaluators for quality scoring, safety scanning, and more.
87
86
88
87
## Deploy models in Foundry
89
88
90
-
Once you select a model, Foundry provides flexible deployment mechanisms that let you tailor performance, cost, and governance. To **deploy a model**means taking an AI model and making it available for use in production through a stable, scalable, and secure endpoint. Deployment of a configured model turns the model into a service that applications can call—usually through an API. Deploying a configured model helps ensure consistent performance and reliability. It also allows developers to prevent unauthorized or unsafe use.
89
+
Once you select a model, Foundry provides flexible deployment mechanisms that let you tailor performance, cost, and governance. **Deploying a model**takes an AI model and makes it available for use in production through a stable, scalable, and secure endpoint. Deployment of a configured model turns the model into a service that applications can call—usually through an API. Deploying a configured model helps ensure consistent performance and reliability. It also allows developers to prevent unauthorized or unsafe use.
91
90
92
91
Deployment parameters that you can customize in Foundry include:
93
92
94
-
-**Deployment type**: such as standard, global batch, and regional provisioned throughput, determine where and how inference is processed in Foundry. These are tied to throughput and data‑processing requirements.
93
+
-**Deployment type**: such as standard, global batch, and regional provisioned throughput, determine where and how inference is processed in Foundry. Deployment types are tied to throughput and data‑processing requirements.
95
94
-**Model version**
96
95
-**Tokens per minute (TPM)** rate limit
97
96
98
97
> [!NOTE]
99
98
> A **token** is the smallest unit of text or data that a generative AI model can process. Models break input into tokens—such as words, subwords, characters, or punctuation—so they can understand and generate language efficiently.
100
99
101
-
When you deploy a model you can assign it a *Tokens Per Minute* (TPM) allocation. TPM determines the speed and scale the model can process inputs and the rate‑limit boundaries such as requests per minute (RPM).
100
+
When you deploy a model, you can assign it a *Tokens Per Minute* (TPM) allocation. TPM determines the speed and scale the model can process inputs and the rate‑limit boundaries such as requests per minute (RPM).
102
101
103
102
Limits differ by model family, for example:
104
-
- High‑end reasoning models (e.g., DeepSeek R1, Grok, large Llama versions) may have high TPM ceilings.
103
+
- High‑end reasoning models (for example: DeepSeek R1, Grok, large Llama versions) may have high TPM ceilings.
105
104
- Specialized or image models often operate under capacity units instead of TPM.
106
105
107
-
*Throttling*, in a compute context, means intentionally slowing down or limiting how much compute work can happen at one time. It’s a protective mechanism used when a system is close to hitting its processing limits. Instead of letting workloads overwhelm the system (which could cause failures or severe slowdowns), throttling temporarily restricts resource usage so the system can remain stable and responsive.
106
+
*Throttling*, in a compute context, means intentionally slowing down or limiting how much compute work can happen at one time. It's a protective mechanism used when a system is close to hitting its processing limits. Throttling temporarily restricts resource usage so the system can remain stable and responsive.
108
107
109
108
Deployment‑level quotas define how many tokens or requests can be processed before throttling occurs. Larger prompts and higher max output token settings consume more TPM, leading to rate-limit errors if exceeded (covered in throttling description search results). If you see throttling, lower **max tokens** or reduce concurrent requests in code.
110
109
111
110
When you deploy a model in Foundry, several things occur:
112
111
- Compute resources are allocated: Foundry assigns the hardware needed to run the model—CPUs, GPUs, memory, networking, and scaling rules.
113
-
- An API endpoint is created: You are able to securely invoke the model through the OpenAI Responses API, validated through management API checks.
112
+
- An API endpoint is created: You're able to securely invoke the model through the OpenAI Responses API, validated through management API checks.
114
113
- Configuration (such as model version, response style, safety settings) is locked in
115
114
- Monitoring and logging become active: usage metrics, performance, latency, errors, and costs are tracked
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/get-started-with-generative-ai-and-agents/includes/3-using-generative-ai-models.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,13 +4,13 @@ The easiest way to interact with a deployed model is to use the model playground
4
4
5
5
## Key configuration parameters
6
6
7
-
Several *model arguments* or *parameters* influence runtime behavior, performance, and cost. In the playground, you can configure settings such as what deployed model you are testing, **temperature**, **max output tokens**, and **system instructions**. Here, you can experiment with giving the model instructions on how to respond, which we sometimes call system prompts, and you can submit prompts in the chat interface and see the responses generated by the model.
7
+
Several *model arguments* or *parameters* influence runtime behavior, performance, and cost. In the playground settings, you can configure parameters such as **temperature**, **max output tokens**, and **system instructions**. In the playground chat interface you can submit prompts and see the responses generated by the model.
8
8
9
9
-**Temperature**: controls creativity vs. determinism.
-**System instructions** – sets behavior and role of the model.
12
12
13
-
Unlike the user prompt, which is the end-user request or question (example: Where should I travel?), a **System prompt** sets behavior, tone, tools, and guardrails for the assistant. An example of a system prompt is: "You are a helpful, step‑by‑step tutor. Cite sources. Decline medical advice".
13
+
Unlike the user prompt, which is the end-user request or question (example: Where should I travel?), a **System prompt** sets behavior, tone, tools, and guardrails for the assistant. An example of a system prompt is: "You are a helpful, step‑by‑step tutor. Cite sources. Decline medical advice."
14
14
15
15
The playground is a useful bridge between Foundry and code. After you test representative prompts, you can use the same system and user prompts and parameter values in your code. The playground provides code that can call your Foundry deployment via the OpenAI‑compatible *Responses* API. The code is essentially what is running when you use the chat interface to configure settings and send user prompts.
16
16
@@ -31,10 +31,10 @@ For Foundry, a lightweight chat client is often a **single Python file** that co
31
31
32
32
#### Build a Python chat client
33
33
34
-
After you have created a **Foundry project** and **deployed a chat model** (for example, `gpt-4.1`), you can use the Foundry SDK. In the example below, the client application uses authentication to connect to the endpoint for the model, submit a prompt, and display the response.
34
+
After you created a **Foundry project** and **deployed a chat model** (for example, `gpt-4.1`), you can use the Foundry SDK. In the example, the client application uses authentication to connect to the endpoint for the model, submit a prompt, and display the response.
35
35
36
36
>[!NOTE]
37
-
>In order to use the SDK, you will need your `azure-ai-projects` is the core Azure AI Projects (Foundry) SDK used to connect to your Foundry project and obtain an OpenAI-compatible client.
37
+
>In order to use the SDK, you need to install the `azure-ai-projects` package. The package is the core Azure AI Projects (Foundry) SDK used to connect to your Foundry project and obtain an OpenAI-compatible client.
0 commit comments