Skip to content

Commit 221e49c

Browse files
committed
fix placeholder link and add todo diagrams
1 parent 6d9340d commit 221e49c

8 files changed

Lines changed: 14 additions & 19 deletions

File tree

learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/includes/2-prompt-engineering.md

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,6 @@ How you structure and combine these components determines how effectively the mo
1515

1616
## Design effective system messages
1717

18-
<!-- TODO: Screenshot of the chat playground in the Microsoft Foundry portal showing the system message pane on the left, a user prompt in the chat area, and a model response. Highlight the system message field to show where developers configure it. -->
19-
2018
A **system message** is a set of instructions you provide to the model to guide its responses. System messages typically appear first in the conversation and act as the highest-level set of instructions. You use them to:
2119

2220
- Define the assistant's role and boundaries.
@@ -124,8 +122,6 @@ When your prompt includes multiple sections — such as instructions, source tex
124122
125123
## Configure model parameters
126124

127-
<!-- TODO: Screenshot of the model parameter configuration panel in the Microsoft Foundry portal chat playground, showing the Temperature and Top_p sliders with their current values. -->
128-
129125
Beyond the text of your prompts, you can adjust model parameters that control how the model generates responses:
130126

131127
- **Temperature**: Controls the randomness of the output. A higher value (for example, 0.7) produces more creative and varied responses, while a lower value (for example, 0.2) produces more focused and deterministic responses. Use lower values for factual tasks and higher values for creative ones.
@@ -134,13 +130,6 @@ Beyond the text of your prompts, you can adjust model parameters that control ho
134130
> [!TIP]
135131
> The general recommendation is to adjust either temperature or top_p, not both at the same time.
136132
137-
Other useful parameters include:
138-
139-
- **Max tokens**: Sets the maximum length of the response.
140-
- **Stop sequences**: Specifies text patterns where the model should stop generating.
141-
- **Frequency penalty**: Reduces the likelihood of the model repeating the same phrases.
142-
- **Presence penalty**: Encourages the model to introduce new topics.
143-
144133
For the travel agency scenario, you might use a low temperature (0.2) when answering factual questions about hotel amenities, but a higher temperature (0.7) when generating creative travel itinerary suggestions.
145134

146135
## When prompt engineering is enough

learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/includes/3-retrieval-augmented-generation.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,25 @@ To address this challenge, you can **ground** the model by providing it with rel
66

77
When you use a language model without grounding, the only information it has comes from its training data. The result might be grammatically correct and logically structured, but it can be inaccurate or include fabricated details. For example, asking "Which hotels do you offer in Paris?" without grounding data might return fictional hotel names.
88

9+
:::image type="content" source="../media/ungrounded.png" alt-text="Diagram showing an ungrounded model returning an uncontextualized response based only on training data.":::
10+
911
When you **ground** a prompt, you provide relevant data from a trusted source along with the user's question. The model then generates a response based on that data, producing more accurate and contextually relevant answers.
1012

1113
Consider the difference:
1214

1315
- **Ungrounded**: The model relies only on its training data and might invent hotel names or details.
1416
- **Grounded**: The model receives your actual hotel catalog data as context and responds with real hotel names, prices, and availability.
1517

18+
:::image type="content" source="../media/grounded.png" alt-text="Diagram comparing an ungrounded model returning generic responses versus a grounded model returning data-backed responses.":::
19+
1620
Grounding improves the factual accuracy of responses by connecting the model to information that is specific, current, and relevant to the user's needs.
1721

1822
## How RAG works
1923

20-
<!-- TODO: Diagram illustrating the RAG pattern flow: (1) User submits a question, (2) the question is used to search a data source/index, (3) retrieved documents are added to the prompt, (4) the augmented prompt is sent to the language model, (5) a grounded response is returned. Similar to the rag-pattern.png in the build-copilot module. -->
21-
2224
RAG is a pattern that retrieves relevant information from a data source and includes it in the prompt before the model generates a response. The process follows three steps:
2325

26+
:::image type="content" source="../media/rag-pattern.png" alt-text="Diagram showing the three-step RAG pattern: retrieve grounding data, augment the prompt with that data, and generate a grounded response.":::
27+
2428
1. **Retrieve**: Search a data source for information that is relevant to the user's question.
2529
1. **Augment**: Add the retrieved information to the prompt as context.
2630
1. **Generate**: Send the augmented prompt to the language model to generate a grounded response.
@@ -31,7 +35,7 @@ By retrieving context from a specified data source, you ensure that the model us
3135

3236
A critical component of RAG is the ability to efficiently find the most relevant information in your data source. This is where **embeddings** and **vector search** come in.
3337

34-
An **embedding** is a mathematical representation of text as a vectora list of floating-point numbers that captures the meaning of words, sentences, or documents. You create embeddings by sending your content to an embedding model, such as an Azure OpenAI embedding model available in Microsoft Foundry.
38+
An **embedding** is a mathematical representation of text as a vectora list of floating-point numbers that captures the meaning of words, sentences, or documents. You create embeddings by sending your content to an embedding model, such as an Azure OpenAI embedding model available in Microsoft Foundry.
3539

3640
For example, imagine two documents:
3741

@@ -40,14 +44,16 @@ For example, imagine two documents:
4044

4145
These sentences use different words but have similar meanings. When you create embeddings for each, their vectors are close together in multidimensional space, reflecting their semantic similarity.
4246

47+
:::image type="content" source="../media/vector-embeddings.jpg" alt-text="Diagram showing text keywords plotted as vectors in multidimensional space, with the distance between vectors representing semantic similarity.":::
48+
4349
**Cosine similarity** measures how close two vectors are by calculating the angle between them. A value near 1 means the vectors are very similar. This mathematical approach enables you to find relevant documents even when the exact words don't match.
4450

4551
## Use Azure AI Search for retrieval
4652

47-
<!-- TODO: Screenshot of the Add Data dialog in the Microsoft Foundry portal, showing the available data source options (Azure Blob Storage, Azure Data Lake Storage Gen2, Microsoft OneLake, file upload). -->
48-
4953
**Azure AI Search** provides the retrieval component for RAG solutions in Microsoft Foundry. It allows you to bring your own data, create a searchable index, and query it to retrieve relevant information.
5054

55+
:::image type="content" source="../media/index.png" alt-text="Diagram showing an Azure AI Search index being queried to retrieve grounding data for a user question.":::
56+
5157
To use Azure AI Search with RAG, you:
5258

5359
1. **Add your data** to Microsoft Foundry from sources like Azure Blob Storage, Azure Data Lake Storage Gen2, or Microsoft OneLake. You can also upload files directly.
@@ -101,7 +107,7 @@ RAG is most effective when:
101107
- **Factual accuracy is critical**: You need responses grounded in real data rather than the model's general knowledge.
102108
- **The base model's training data has a cutoff**: Events or information that occurred after the model's training cutoff date need to be accessible.
103109

104-
For the travel agency scenario, RAG allows customers to ask questions about specific hotels, destinations, and booking policiesall grounded in the agency's actual catalog data.
110+
For the travel agency scenario, RAG allows customers to ask questions about specific hotels, destinations, and booking policies, all grounded in the agency's actual catalog data.
105111

106112
> [!TIP]
107-
> If you're building agents that need grounded knowledge without managing your own search infrastructure, consider **Foundry IQ** — a managed knowledge store that simplifies grounding for AI agents. To learn more, see [YOURPLACEHOLDER]().
113+
> If you're building agents that need grounded knowledge without managing your own search infrastructure, consider **Foundry IQ** — a managed knowledge store that simplifies grounding for AI agents. To learn more, see [Build knowledge-enhanced AI agents with Foundry IQ](/training/modules/introduction-foundry-iq/?azure-portal=true).

learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/includes/5-compare-combine-strategies.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Now that you've explored prompt engineering, RAG, and fine-tuning individually, let's look at how they relate to each other. These strategies aren't mutually exclusivethey're complementary methods that you can combine to meet different optimization goals.
1+
Now that you've explored prompt engineering, RAG, and fine-tuning individually, let's look at how they relate to each other. These strategies aren't mutually exclusive; they're complementary methods that you can combine to meet different optimization goals.
22

33
## Understand the optimization spectrum
44

17.5 KB
Loading
23.6 KB
Loading
21.6 KB
Loading
13.4 KB
Loading
312 KB
Loading

0 commit comments

Comments
 (0)