MicrosoftDocs
diff --git a/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/1-introduction.yml‎
Lines changed: 14 additions & 0 deletions b/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/1-introduction.yml‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/2-prompt-engineering.yml‎
Lines changed: 14 additions & 0 deletions b/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/2-prompt-engineering.yml‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/3-retrieval-augmented-generation.yml‎
Lines changed: 14 additions & 0 deletions b/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/3-retrieval-augmented-generation.yml‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/4-fine-tune-model.yml‎
Lines changed: 14 additions & 0 deletions b/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/4-fine-tune-model.yml‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/5-compare-combine-strategies.yml‎
Lines changed: 14 additions & 0 deletions b/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/5-compare-combine-strategies.yml‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/6-exercise.yml‎
Lines changed: 14 additions & 0 deletions b/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/6-exercise.yml‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/7-knowledge-check.yml‎
Lines changed: 71 additions & 0 deletions b/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/7-knowledge-check.yml‎
Lines changed: 71 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/8-summary.yml‎
Lines changed: 14 additions & 0 deletions b/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/8-summary.yml‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/includes/1-introduction.md‎
Lines changed: 21 additions & 0 deletions b/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/includes/1-introduction.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/includes/2-prompt-engineering.md‎
Lines changed: 144 additions & 0 deletions b/‎learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/includes/2-prompt-engineering.md‎
Lines changed: 144 additions & 0 deletions
@@ -0,0 +1,14 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.optimize-generative-ai-model-performance.introduction
+title: Introduction
+metadata:
+  title: Introduction
+  description: Introduction to optimizing generative AI model performance with Microsoft Foundry.
+  author: ivorb
+  ms.author: berryivor
+  ms.date: 02/24/2026
+  ms.topic: unit
+  ai-usage: ai-assisted
+durationInMinutes: 2
+content: |
+  [!include[](includes/1-introduction.md)]
@@ -0,0 +1,14 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.optimize-generative-ai-model-performance.prompt-engineering
+title: Optimize model output with prompt engineering
+metadata:
+  title: Optimize model output with prompt engineering
+  description: Learn how to use prompt engineering techniques including system messages, prompt patterns, and model parameters to optimize language model output.
+  author: ivorb
+  ms.author: berryivor
+  ms.date: 02/24/2026
+  ms.topic: unit
+  ai-usage: ai-assisted
+durationInMinutes: 9
+content: |
+  [!include[](includes/2-prompt-engineering.md)]
@@ -0,0 +1,14 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.optimize-generative-ai-model-performance.retrieval-augmented-generation
+title: Ground your model with Retrieval Augmented Generation
+metadata:
+  title: Ground your model with Retrieval Augmented Generation
+  description: Learn when and how to use Retrieval Augmented Generation (RAG) to ground a language model with domain-specific data for more accurate responses.
+  author: ivorb
+  ms.author: berryivor
+  ms.date: 02/24/2026
+  ms.topic: unit
+  ai-usage: ai-assisted
+durationInMinutes: 9
+content: |
+  [!include[](includes/3-retrieval-augmented-generation.md)]
@@ -0,0 +1,14 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.optimize-generative-ai-model-performance.fine-tune-model
+title: Fine-tune a model for consistent behavior
+metadata:
+  title: Fine-tune a model for consistent behavior
+  description: Learn when and how to fine-tune a language model to maximize behavioral consistency, including supervised fine-tuning, reinforcement fine-tuning, and Direct Preference Optimization.
+  author: ivorb
+  ms.author: berryivor
+  ms.date: 02/24/2026
+  ms.topic: unit
+  ai-usage: ai-assisted
+durationInMinutes: 9
+content: |
+  [!include[](includes/4-fine-tune-model.md)]
@@ -0,0 +1,14 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.optimize-generative-ai-model-performance.compare-combine-strategies
+title: Compare and combine optimization strategies
+metadata:
+  title: Compare and combine optimization strategies
+  description: Compare prompt engineering, RAG, and fine-tuning strategies and learn when and how to combine them for optimal generative AI model performance.
+  author: ivorb
+  ms.author: berryivor
+  ms.date: 02/24/2026
+  ms.topic: unit
+  ai-usage: ai-assisted
+durationInMinutes: 7
+content: |
+  [!include[](includes/5-compare-combine-strategies.md)]
@@ -0,0 +1,14 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.optimize-generative-ai-model-performance.exercise
+title: Exercise - Optimize generative AI model performance
+metadata:
+  title: Exercise - Optimize generative AI model performance
+  description: Exercise - Optimize generative AI model performance using Microsoft Foundry.
+  author: ivorb
+  ms.author: berryivor
+  ms.date: 02/24/2026
+  ms.topic: unit
+  ai-usage: ai-assisted
+durationInMinutes: 60
+content: |
+  [!include[](includes/6-exercise.md)]
@@ -0,0 +1,71 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.optimize-generative-ai-model-performance.knowledge-check
+title: Module assessment
+metadata:
+  title: Module assessment
+  description: Knowledge check about optimizing generative AI model performance.
+  author: ivorb
+  ms.author: berryivor
+  ms.date: 02/24/2026
+  ms.topic: unit
+  ai-usage: ai-assisted
+  module_assessment: true
+durationInMinutes: 3
+content: |
+quiz:
+  questions:
+  - content: "What is the primary purpose of a system message in a prompt?"
+    choices:
+    - content: "To define the model's role, behavior, and output constraints."
+      isCorrect: true
+      explanation: "Correct. A system message sets instructions that guide the model's responses, including its role, tone, format, and boundaries."
+    - content: "To provide training data that permanently changes the model."
+      isCorrect: false
+      explanation: "Incorrect. System messages provide instructions for the current conversation but don't permanently change the model. Fine-tuning changes the model's weights."
+    - content: "To retrieve data from an external data source."
+      isCorrect: false
+      explanation: "Incorrect. Retrieving data from external sources is the role of RAG, not system messages."
+  - content: "When should you use Retrieval Augmented Generation (RAG) instead of relying on prompt engineering alone?"
+    choices:
+    - content: "When you want the model to respond in a consistent style and format."
+      isCorrect: false
+      explanation: "Incorrect. For consistent style and format, fine-tuning is more appropriate. RAG is used to improve factual accuracy with external data."
+    - content: "When the model needs access to domain-specific or current data that it wasn't trained on."
+      isCorrect: true
+      explanation: "Correct. RAG retrieves relevant data from external sources at query time, enabling the model to generate accurate responses based on specific, current, or private data."
+    - content: "When you want to reduce the length of prompts sent to the model."
+      isCorrect: false
+      explanation: "Incorrect. RAG actually adds retrieved context to the prompt. Fine-tuning is a strategy that can reduce prompt length."
+  - content: "What does the temperature parameter control in a language model?"
+    choices:
+    - content: "The maximum number of tokens the model can generate."
+      isCorrect: false
+      explanation: "Incorrect. The maximum number of tokens is controlled by the max tokens parameter, not temperature."
+    - content: "The randomness and creativity of the model's responses."
+      isCorrect: true
+      explanation: "Correct. A higher temperature produces more random and creative responses, while a lower temperature produces more focused and deterministic output."
+    - content: "The speed at which the model processes requests."
+      isCorrect: false
+      explanation: "Incorrect. Temperature doesn't affect processing speed. It controls the randomness of token selection during response generation."
+  - content: "What does fine-tuning optimize in a language model?"
+    choices:
+    - content: "The factual accuracy of responses by connecting to external data."
+      isCorrect: false
+      explanation: "Incorrect. Connecting to external data for factual accuracy is achieved through RAG. Fine-tuning optimizes the model's behavioral consistency."
+    - content: "The consistency of the model's behavior, style, and output format."
+      isCorrect: true
+      explanation: "Correct. Fine-tuning trains the model on examples that demonstrate the desired style, tone, and format, maximizing the consistency of the model's behavior."
+    - content: "The number of tokens the model can process in a single request."
+      isCorrect: false
+      explanation: "Incorrect. Fine-tuning doesn't change the model's token processing capacity. It adjusts the model's weights to produce more consistent responses."
+  - content: "You're building a chat application that needs to answer questions using your company's product catalog while maintaining a specific brand voice. Which combination of strategies is most appropriate?"
+    choices:
+    - content: "Prompt engineering only, with detailed system messages."
+      isCorrect: false
+      explanation: "Incorrect. While prompt engineering is a good starting point, it can't give the model access to your product catalog data or guarantee consistent brand voice across all interactions."
+    - content: "RAG for the product catalog data, fine-tuning for the brand voice, and prompt engineering for conversation-specific instructions."
+      isCorrect: true
+      explanation: "Correct. RAG grounds the model in your actual product data, fine-tuning ensures consistent brand voice, and prompt engineering adds per-conversation guidance. These strategies are complementary."
+    - content: "Fine-tuning only, with the product catalog included in the training data."
+      isCorrect: false
+      explanation: "Incorrect. Fine-tuning alone isn't ideal for frequently changing data like product catalogs. RAG is better suited for dynamic, domain-specific data because it retrieves current information at query time."
@@ -0,0 +1,14 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.optimize-generative-ai-model-performance.summary
+title: Summary
+metadata:
+  title: Summary
+  description: Summary of optimizing generative AI model performance with Microsoft Foundry.
+  author: ivorb
+  ms.author: berryivor
+  ms.date: 02/24/2026
+  ms.topic: unit
+  ai-usage: ai-assisted
+durationInMinutes: 2
+content: |
+  [!include[](includes/8-summary.md)]
@@ -0,0 +1,21 @@
+Language models are powerful tools for building generative AI applications, but a base model on its own might not meet all of your requirements. The quality, accuracy, and consistency of the responses a model generates depend on how you configure and augment it.
+
+Imagine you're a developer working for a travel agency. You're building a chat application to help customers with their travel-related questions. The base model gives decent responses, but your team has specific needs: the responses should follow the company's tone of voice, include accurate information about your hotel catalog, and maintain a consistent format across interactions. How do you get the model to perform at this level?
+
+There are several complementary strategies you can use to optimize a generative AI model's performance. These strategies range from quick, low-cost adjustments to more involved techniques that require additional time and resources.
+
+:::image type="content" source="../media/model-optimization.png" alt-text="Diagram showing the various strategies to optimize the model's performance, from prompt engineering to RAG and fine-tuning.":::
+
+Throughout this module, you explore each of these strategies and learn when and how to apply them individually or in combination.
+
+In this module, you learn how to:
+
+- Apply prompt engineering techniques including system messages, few-shot learning, and model parameters to optimize model output.
+- Understand when and how to ground a language model using Retrieval Augmented Generation (RAG).
+- Identify when fine-tuning a model improves behavioral consistency.
+- Compare optimization strategies and determine when to combine them.
+
+## Prerequisites
+
+- Familiarity with fundamental AI concepts and services in Azure.
+- A basic understanding of generative AI models and how they generate responses.
@@ -0,0 +1,144 @@
+The most accessible way to optimize a model's performance is through **prompt engineering**. Prompt engineering is the process of designing and refining prompts to improve the quality, accuracy, and relevance of the responses a language model generates. It requires no additional infrastructure or training data, and you can start experimenting immediately.
+
+## Understand prompt components
+
+When you interact with a language model, the quality of your question directly influences the quality of the response. A well-constructed prompt helps the model understand what you need and generate a more useful answer.
+
+Prompts for chat completion models typically include the following components:
+
+- **System message**: Instructions that define the model's behavior, role, and constraints.
+- **User message**: The question or input from the user.
+- **Assistant message**: Previous model responses, used in multi-turn conversations.
+- **Examples**: Sample input/output pairs that demonstrate the expected response format.
+
+How you structure and combine these components determines how effectively the model responds.
+
+## Design effective system messages
+
+A **system message** is a set of instructions you provide to the model to guide its responses. System messages typically appear first in the conversation and act as the highest-level set of instructions. You use them to:
+
+- Define the assistant's role and boundaries.
+- Set the tone and communication style.
+- Specify output formats, such as JSON or bullet points.
+- Add safety and quality constraints for your scenario.
+
+A system message can be as simple as:
+
+```text
+You are a helpful AI assistant.
+```
+
+Or it can include detailed rules and formatting requirements. For example, the travel agency's chat application could use:
+
+```text
+You are a friendly travel advisor for Margie's Travel.
+Answer only questions related to travel, hotels, and trip planning.
+Use a warm, conversational tone.
+If you don't have enough information to answer, ask a clarifying question.
+Format hotel recommendations as a bulleted list with the hotel name, location, and price range.
+```
+
+> [!IMPORTANT]
+> A system message influences the model but doesn't guarantee compliance. You should test and iterate on your system messages, and layer them with other mitigations like content filtering and evaluation.
+
+When designing a system message, follow this checklist:
+
+1. **Start with the assistant's role**: State the role and the expected outcome for a typical request.
+1. **Define boundaries**: List the topics, actions, and content types the assistant should avoid.
+1. **Specify the output format**: If you need a specific format, state it plainly and keep it consistent.
+1. **Add a "when unsure" policy**: Tell the model what to do when the user's request is ambiguous, out of scope, or when the model lacks information.
+
+## Apply prompt patterns
+
+Effective prompts use patterns that help the model produce better responses. Here are some common patterns you can use:
+
+### Persona pattern
+
+Instruct the model to take on a specific perspective or role. For example, asking the model to respond as a seasoned marketing professional produces different results than using no persona at all.
+
+| | No persona | With persona |
+|---|---|---|
+| **System message** | *None* | You're a seasoned marketing professional writing for technical customers. |
+| **User prompt** | Write a one-sentence description of a CRM product. | Write a one-sentence description of a CRM product. |
+| **Response** | A CRM product is a software tool designed to manage a company's interactions with customers. | Experience seamless customer relationship management with our CRM, designed to streamline operations and drive sales growth with robust analytics. |
+
+### Format template pattern
+
+Provide a template or structure in your prompt to get output in a specific format. For example, if you need a structured response about a hotel:
+
+```text
+Format the result to show:
+- Hotel name
+- Location
+- Star rating
+- Price range per night
+```
+
+This pattern ensures consistent, organized responses that are easy to parse in your application.
+
+### Chain-of-thought pattern
+
+Ask the model to explain its reasoning step by step. This technique, called **chain of thought**, reduces the chance of inaccurate results and makes it easier to verify the model's logic.
+
+For example, instead of asking "Which hotel is best for a family of four?", you can prompt:
+
+```text
+Which hotel is best for a family of four? Take a step-by-step approach: 
+consider room size, amenities for children, location, and price.
+```
+
+A related technique is to **break the task down** into explicit sub-steps *before* the model responds, rather than asking it to reason through everything at once. For example, you might first ask the model to extract key facts from a passage, and then in a follow-up prompt ask it to answer a question based on those facts. Decomposing the work this way reduces errors on complex, multi-part tasks.
+
+> [!NOTE]
+> Chain-of-thought prompting is a technique for non-reasoning models. Reasoning models like o-series models handle step-by-step logic internally.
+
+### Few-shot learning pattern
+
+Provide one or more examples of the desired input and output to help the model identify the pattern you want. This technique is called **few-shot learning** (or **one-shot** for a single example). When no examples are provided, it's called **zero-shot** learning.
+
+For example, to classify customer inquiries:
+
+```text
+Classify the following customer messages:
+
+Message: "I need to change my flight to Rome"
+Category: Booking change
+
+Message: "What's the weather like in Bali in March?"
+Category: Travel information
+
+Message: "Can I get a refund for my cancelled tour?"
+Category:
+```
+
+The model learns the classification pattern from the examples and correctly completes the last entry.
+
+### Use clear syntax and delimiters
+
+When your prompt includes multiple sections — such as instructions, source text, and examples — use delimiters like `---`, Markdown headings, or XML tags to separate them. Clear boundaries help the model distinguish instructions from content and reduce the chance of misinterpretation.
+
+> [!TIP]
+> Models can be susceptible to **recency bias**, meaning text near the end of a prompt can have more influence than text at the beginning. If the model isn't following your instructions consistently, try repeating the key instruction at the end of the prompt.
+
+## Configure model parameters
+
+Beyond the text of your prompts, you can adjust model parameters that control how the model generates responses:
+
+- **Temperature**: Controls the randomness of the output. A higher value (for example, 0.7) produces more creative and varied responses, while a lower value (for example, 0.2) produces more focused and deterministic responses. Use lower values for factual tasks and higher values for creative ones.
+- **Top_p**: Also controls randomness, but in a different way. It limits the model to a subset of the most probable next tokens. For example, a `top_p` of 0.9 means the model considers only the top 90% of probable tokens.
+
+> [!TIP]
+> The general recommendation is to adjust either temperature or top_p, not both at the same time.
+
+For the travel agency scenario, you might use a low temperature (0.2) when answering factual questions about hotel amenities, but a higher temperature (0.7) when generating creative travel itinerary suggestions.
+
+## When prompt engineering is enough
+
+Prompt engineering is the right starting point for any model optimization effort. It's effective when you need to:
+
+- Guide the model's tone, format, and behavior.
+- Provide specific instructions for a task.
+- Quickly iterate on results without infrastructure changes.
+- Keep costs low, as no additional training or data storage is required.
+
+However, prompt engineering has limits. If the model doesn't have access to the information it needs (like your company's hotel catalog), or if it consistently fails to maintain a specific behavior despite detailed instructions, you need to consider additional strategies.