Skip to content

Commit 6ca3781

Browse files
authored
Merge pull request #53608 from ivorb/new-agents
optimize module
2 parents a07408d + 221e49c commit 6ca3781

23 files changed

Lines changed: 675 additions & 0 deletions
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.optimize-generative-ai-model-performance.introduction
3+
title: Introduction
4+
metadata:
5+
title: Introduction
6+
description: Introduction to optimizing generative AI model performance with Microsoft Foundry.
7+
author: ivorb
8+
ms.author: berryivor
9+
ms.date: 02/24/2026
10+
ms.topic: unit
11+
ai-usage: ai-assisted
12+
durationInMinutes: 2
13+
content: |
14+
[!include[](includes/1-introduction.md)]
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.optimize-generative-ai-model-performance.prompt-engineering
3+
title: Optimize model output with prompt engineering
4+
metadata:
5+
title: Optimize model output with prompt engineering
6+
description: Learn how to use prompt engineering techniques including system messages, prompt patterns, and model parameters to optimize language model output.
7+
author: ivorb
8+
ms.author: berryivor
9+
ms.date: 02/24/2026
10+
ms.topic: unit
11+
ai-usage: ai-assisted
12+
durationInMinutes: 9
13+
content: |
14+
[!include[](includes/2-prompt-engineering.md)]
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.optimize-generative-ai-model-performance.retrieval-augmented-generation
3+
title: Ground your model with Retrieval Augmented Generation
4+
metadata:
5+
title: Ground your model with Retrieval Augmented Generation
6+
description: Learn when and how to use Retrieval Augmented Generation (RAG) to ground a language model with domain-specific data for more accurate responses.
7+
author: ivorb
8+
ms.author: berryivor
9+
ms.date: 02/24/2026
10+
ms.topic: unit
11+
ai-usage: ai-assisted
12+
durationInMinutes: 9
13+
content: |
14+
[!include[](includes/3-retrieval-augmented-generation.md)]
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.optimize-generative-ai-model-performance.fine-tune-model
3+
title: Fine-tune a model for consistent behavior
4+
metadata:
5+
title: Fine-tune a model for consistent behavior
6+
description: Learn when and how to fine-tune a language model to maximize behavioral consistency, including supervised fine-tuning, reinforcement fine-tuning, and Direct Preference Optimization.
7+
author: ivorb
8+
ms.author: berryivor
9+
ms.date: 02/24/2026
10+
ms.topic: unit
11+
ai-usage: ai-assisted
12+
durationInMinutes: 9
13+
content: |
14+
[!include[](includes/4-fine-tune-model.md)]
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.optimize-generative-ai-model-performance.compare-combine-strategies
3+
title: Compare and combine optimization strategies
4+
metadata:
5+
title: Compare and combine optimization strategies
6+
description: Compare prompt engineering, RAG, and fine-tuning strategies and learn when and how to combine them for optimal generative AI model performance.
7+
author: ivorb
8+
ms.author: berryivor
9+
ms.date: 02/24/2026
10+
ms.topic: unit
11+
ai-usage: ai-assisted
12+
durationInMinutes: 7
13+
content: |
14+
[!include[](includes/5-compare-combine-strategies.md)]
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.optimize-generative-ai-model-performance.exercise
3+
title: Exercise - Optimize generative AI model performance
4+
metadata:
5+
title: Exercise - Optimize generative AI model performance
6+
description: Exercise - Optimize generative AI model performance using Microsoft Foundry.
7+
author: ivorb
8+
ms.author: berryivor
9+
ms.date: 02/24/2026
10+
ms.topic: unit
11+
ai-usage: ai-assisted
12+
durationInMinutes: 60
13+
content: |
14+
[!include[](includes/6-exercise.md)]
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.optimize-generative-ai-model-performance.knowledge-check
3+
title: Module assessment
4+
metadata:
5+
title: Module assessment
6+
description: Knowledge check about optimizing generative AI model performance.
7+
author: ivorb
8+
ms.author: berryivor
9+
ms.date: 02/24/2026
10+
ms.topic: unit
11+
ai-usage: ai-assisted
12+
module_assessment: true
13+
durationInMinutes: 3
14+
content: |
15+
quiz:
16+
questions:
17+
- content: "What is the primary purpose of a system message in a prompt?"
18+
choices:
19+
- content: "To define the model's role, behavior, and output constraints."
20+
isCorrect: true
21+
explanation: "Correct. A system message sets instructions that guide the model's responses, including its role, tone, format, and boundaries."
22+
- content: "To provide training data that permanently changes the model."
23+
isCorrect: false
24+
explanation: "Incorrect. System messages provide instructions for the current conversation but don't permanently change the model. Fine-tuning changes the model's weights."
25+
- content: "To retrieve data from an external data source."
26+
isCorrect: false
27+
explanation: "Incorrect. Retrieving data from external sources is the role of RAG, not system messages."
28+
- content: "When should you use Retrieval Augmented Generation (RAG) instead of relying on prompt engineering alone?"
29+
choices:
30+
- content: "When you want the model to respond in a consistent style and format."
31+
isCorrect: false
32+
explanation: "Incorrect. For consistent style and format, fine-tuning is more appropriate. RAG is used to improve factual accuracy with external data."
33+
- content: "When the model needs access to domain-specific or current data that it wasn't trained on."
34+
isCorrect: true
35+
explanation: "Correct. RAG retrieves relevant data from external sources at query time, enabling the model to generate accurate responses based on specific, current, or private data."
36+
- content: "When you want to reduce the length of prompts sent to the model."
37+
isCorrect: false
38+
explanation: "Incorrect. RAG actually adds retrieved context to the prompt. Fine-tuning is a strategy that can reduce prompt length."
39+
- content: "What does the temperature parameter control in a language model?"
40+
choices:
41+
- content: "The maximum number of tokens the model can generate."
42+
isCorrect: false
43+
explanation: "Incorrect. The maximum number of tokens is controlled by the max tokens parameter, not temperature."
44+
- content: "The randomness and creativity of the model's responses."
45+
isCorrect: true
46+
explanation: "Correct. A higher temperature produces more random and creative responses, while a lower temperature produces more focused and deterministic output."
47+
- content: "The speed at which the model processes requests."
48+
isCorrect: false
49+
explanation: "Incorrect. Temperature doesn't affect processing speed. It controls the randomness of token selection during response generation."
50+
- content: "What does fine-tuning optimize in a language model?"
51+
choices:
52+
- content: "The factual accuracy of responses by connecting to external data."
53+
isCorrect: false
54+
explanation: "Incorrect. Connecting to external data for factual accuracy is achieved through RAG. Fine-tuning optimizes the model's behavioral consistency."
55+
- content: "The consistency of the model's behavior, style, and output format."
56+
isCorrect: true
57+
explanation: "Correct. Fine-tuning trains the model on examples that demonstrate the desired style, tone, and format, maximizing the consistency of the model's behavior."
58+
- content: "The number of tokens the model can process in a single request."
59+
isCorrect: false
60+
explanation: "Incorrect. Fine-tuning doesn't change the model's token processing capacity. It adjusts the model's weights to produce more consistent responses."
61+
- content: "You're building a chat application that needs to answer questions using your company's product catalog while maintaining a specific brand voice. Which combination of strategies is most appropriate?"
62+
choices:
63+
- content: "Prompt engineering only, with detailed system messages."
64+
isCorrect: false
65+
explanation: "Incorrect. While prompt engineering is a good starting point, it can't give the model access to your product catalog data or guarantee consistent brand voice across all interactions."
66+
- content: "RAG for the product catalog data, fine-tuning for the brand voice, and prompt engineering for conversation-specific instructions."
67+
isCorrect: true
68+
explanation: "Correct. RAG grounds the model in your actual product data, fine-tuning ensures consistent brand voice, and prompt engineering adds per-conversation guidance. These strategies are complementary."
69+
- content: "Fine-tuning only, with the product catalog included in the training data."
70+
isCorrect: false
71+
explanation: "Incorrect. Fine-tuning alone isn't ideal for frequently changing data like product catalogs. RAG is better suited for dynamic, domain-specific data because it retrieves current information at query time."
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.optimize-generative-ai-model-performance.summary
3+
title: Summary
4+
metadata:
5+
title: Summary
6+
description: Summary of optimizing generative AI model performance with Microsoft Foundry.
7+
author: ivorb
8+
ms.author: berryivor
9+
ms.date: 02/24/2026
10+
ms.topic: unit
11+
ai-usage: ai-assisted
12+
durationInMinutes: 2
13+
content: |
14+
[!include[](includes/8-summary.md)]
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
Language models are powerful tools for building generative AI applications, but a base model on its own might not meet all of your requirements. The quality, accuracy, and consistency of the responses a model generates depend on how you configure and augment it.
2+
3+
Imagine you're a developer working for a travel agency. You're building a chat application to help customers with their travel-related questions. The base model gives decent responses, but your team has specific needs: the responses should follow the company's tone of voice, include accurate information about your hotel catalog, and maintain a consistent format across interactions. How do you get the model to perform at this level?
4+
5+
There are several complementary strategies you can use to optimize a generative AI model's performance. These strategies range from quick, low-cost adjustments to more involved techniques that require additional time and resources.
6+
7+
:::image type="content" source="../media/model-optimization.png" alt-text="Diagram showing the various strategies to optimize the model's performance, from prompt engineering to RAG and fine-tuning.":::
8+
9+
Throughout this module, you explore each of these strategies and learn when and how to apply them individually or in combination.
10+
11+
In this module, you learn how to:
12+
13+
- Apply prompt engineering techniques including system messages, few-shot learning, and model parameters to optimize model output.
14+
- Understand when and how to ground a language model using Retrieval Augmented Generation (RAG).
15+
- Identify when fine-tuning a model improves behavioral consistency.
16+
- Compare optimization strategies and determine when to combine them.
17+
18+
## Prerequisites
19+
20+
- Familiarity with fundamental AI concepts and services in Azure.
21+
- A basic understanding of generative AI models and how they generate responses.
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
The most accessible way to optimize a model's performance is through **prompt engineering**. Prompt engineering is the process of designing and refining prompts to improve the quality, accuracy, and relevance of the responses a language model generates. It requires no additional infrastructure or training data, and you can start experimenting immediately.
2+
3+
## Understand prompt components
4+
5+
When you interact with a language model, the quality of your question directly influences the quality of the response. A well-constructed prompt helps the model understand what you need and generate a more useful answer.
6+
7+
Prompts for chat completion models typically include the following components:
8+
9+
- **System message**: Instructions that define the model's behavior, role, and constraints.
10+
- **User message**: The question or input from the user.
11+
- **Assistant message**: Previous model responses, used in multi-turn conversations.
12+
- **Examples**: Sample input/output pairs that demonstrate the expected response format.
13+
14+
How you structure and combine these components determines how effectively the model responds.
15+
16+
## Design effective system messages
17+
18+
A **system message** is a set of instructions you provide to the model to guide its responses. System messages typically appear first in the conversation and act as the highest-level set of instructions. You use them to:
19+
20+
- Define the assistant's role and boundaries.
21+
- Set the tone and communication style.
22+
- Specify output formats, such as JSON or bullet points.
23+
- Add safety and quality constraints for your scenario.
24+
25+
A system message can be as simple as:
26+
27+
```text
28+
You are a helpful AI assistant.
29+
```
30+
31+
Or it can include detailed rules and formatting requirements. For example, the travel agency's chat application could use:
32+
33+
```text
34+
You are a friendly travel advisor for Margie's Travel.
35+
Answer only questions related to travel, hotels, and trip planning.
36+
Use a warm, conversational tone.
37+
If you don't have enough information to answer, ask a clarifying question.
38+
Format hotel recommendations as a bulleted list with the hotel name, location, and price range.
39+
```
40+
41+
> [!IMPORTANT]
42+
> A system message influences the model but doesn't guarantee compliance. You should test and iterate on your system messages, and layer them with other mitigations like content filtering and evaluation.
43+
44+
When designing a system message, follow this checklist:
45+
46+
1. **Start with the assistant's role**: State the role and the expected outcome for a typical request.
47+
1. **Define boundaries**: List the topics, actions, and content types the assistant should avoid.
48+
1. **Specify the output format**: If you need a specific format, state it plainly and keep it consistent.
49+
1. **Add a "when unsure" policy**: Tell the model what to do when the user's request is ambiguous, out of scope, or when the model lacks information.
50+
51+
## Apply prompt patterns
52+
53+
Effective prompts use patterns that help the model produce better responses. Here are some common patterns you can use:
54+
55+
### Persona pattern
56+
57+
Instruct the model to take on a specific perspective or role. For example, asking the model to respond as a seasoned marketing professional produces different results than using no persona at all.
58+
59+
| | No persona | With persona |
60+
|---|---|---|
61+
| **System message** | *None* | You're a seasoned marketing professional writing for technical customers. |
62+
| **User prompt** | Write a one-sentence description of a CRM product. | Write a one-sentence description of a CRM product. |
63+
| **Response** | A CRM product is a software tool designed to manage a company's interactions with customers. | Experience seamless customer relationship management with our CRM, designed to streamline operations and drive sales growth with robust analytics. |
64+
65+
### Format template pattern
66+
67+
Provide a template or structure in your prompt to get output in a specific format. For example, if you need a structured response about a hotel:
68+
69+
```text
70+
Format the result to show:
71+
- Hotel name
72+
- Location
73+
- Star rating
74+
- Price range per night
75+
```
76+
77+
This pattern ensures consistent, organized responses that are easy to parse in your application.
78+
79+
### Chain-of-thought pattern
80+
81+
Ask the model to explain its reasoning step by step. This technique, called **chain of thought**, reduces the chance of inaccurate results and makes it easier to verify the model's logic.
82+
83+
For example, instead of asking "Which hotel is best for a family of four?", you can prompt:
84+
85+
```text
86+
Which hotel is best for a family of four? Take a step-by-step approach:
87+
consider room size, amenities for children, location, and price.
88+
```
89+
90+
A related technique is to **break the task down** into explicit sub-steps *before* the model responds, rather than asking it to reason through everything at once. For example, you might first ask the model to extract key facts from a passage, and then in a follow-up prompt ask it to answer a question based on those facts. Decomposing the work this way reduces errors on complex, multi-part tasks.
91+
92+
> [!NOTE]
93+
> Chain-of-thought prompting is a technique for non-reasoning models. Reasoning models like o-series models handle step-by-step logic internally.
94+
95+
### Few-shot learning pattern
96+
97+
Provide one or more examples of the desired input and output to help the model identify the pattern you want. This technique is called **few-shot learning** (or **one-shot** for a single example). When no examples are provided, it's called **zero-shot** learning.
98+
99+
For example, to classify customer inquiries:
100+
101+
```text
102+
Classify the following customer messages:
103+
104+
Message: "I need to change my flight to Rome"
105+
Category: Booking change
106+
107+
Message: "What's the weather like in Bali in March?"
108+
Category: Travel information
109+
110+
Message: "Can I get a refund for my cancelled tour?"
111+
Category:
112+
```
113+
114+
The model learns the classification pattern from the examples and correctly completes the last entry.
115+
116+
### Use clear syntax and delimiters
117+
118+
When your prompt includes multiple sections — such as instructions, source text, and examples — use delimiters like `---`, Markdown headings, or XML tags to separate them. Clear boundaries help the model distinguish instructions from content and reduce the chance of misinterpretation.
119+
120+
> [!TIP]
121+
> Models can be susceptible to **recency bias**, meaning text near the end of a prompt can have more influence than text at the beginning. If the model isn't following your instructions consistently, try repeating the key instruction at the end of the prompt.
122+
123+
## Configure model parameters
124+
125+
Beyond the text of your prompts, you can adjust model parameters that control how the model generates responses:
126+
127+
- **Temperature**: Controls the randomness of the output. A higher value (for example, 0.7) produces more creative and varied responses, while a lower value (for example, 0.2) produces more focused and deterministic responses. Use lower values for factual tasks and higher values for creative ones.
128+
- **Top_p**: Also controls randomness, but in a different way. It limits the model to a subset of the most probable next tokens. For example, a `top_p` of 0.9 means the model considers only the top 90% of probable tokens.
129+
130+
> [!TIP]
131+
> The general recommendation is to adjust either temperature or top_p, not both at the same time.
132+
133+
For the travel agency scenario, you might use a low temperature (0.2) when answering factual questions about hotel amenities, but a higher temperature (0.7) when generating creative travel itinerary suggestions.
134+
135+
## When prompt engineering is enough
136+
137+
Prompt engineering is the right starting point for any model optimization effort. It's effective when you need to:
138+
139+
- Guide the model's tone, format, and behavior.
140+
- Provide specific instructions for a task.
141+
- Quickly iterate on results without infrastructure changes.
142+
- Keep costs low, as no additional training or data storage is required.
143+
144+
However, prompt engineering has limits. If the model doesn't have access to the information it needs (like your company's hotel catalog), or if it consistently fails to maintain a specific behavior despite detailed instructions, you need to consider additional strategies.

0 commit comments

Comments
 (0)