You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/includes/2-prompt-engineering.md
-11Lines changed: 0 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,8 +15,6 @@ How you structure and combine these components determines how effectively the mo
15
15
16
16
## Design effective system messages
17
17
18
-
<!-- TODO: Screenshot of the chat playground in the Microsoft Foundry portal showing the system message pane on the left, a user prompt in the chat area, and a model response. Highlight the system message field to show where developers configure it. -->
19
-
20
18
A **system message** is a set of instructions you provide to the model to guide its responses. System messages typically appear first in the conversation and act as the highest-level set of instructions. You use them to:
21
19
22
20
- Define the assistant's role and boundaries.
@@ -124,8 +122,6 @@ When your prompt includes multiple sections — such as instructions, source tex
124
122
125
123
## Configure model parameters
126
124
127
-
<!-- TODO: Screenshot of the model parameter configuration panel in the Microsoft Foundry portal chat playground, showing the Temperature and Top_p sliders with their current values. -->
128
-
129
125
Beyond the text of your prompts, you can adjust model parameters that control how the model generates responses:
130
126
131
127
-**Temperature**: Controls the randomness of the output. A higher value (for example, 0.7) produces more creative and varied responses, while a lower value (for example, 0.2) produces more focused and deterministic responses. Use lower values for factual tasks and higher values for creative ones.
@@ -134,13 +130,6 @@ Beyond the text of your prompts, you can adjust model parameters that control ho
134
130
> [!TIP]
135
131
> The general recommendation is to adjust either temperature or top_p, not both at the same time.
136
132
137
-
Other useful parameters include:
138
-
139
-
-**Max tokens**: Sets the maximum length of the response.
140
-
-**Stop sequences**: Specifies text patterns where the model should stop generating.
141
-
-**Frequency penalty**: Reduces the likelihood of the model repeating the same phrases.
142
-
-**Presence penalty**: Encourages the model to introduce new topics.
143
-
144
133
For the travel agency scenario, you might use a low temperature (0.2) when answering factual questions about hotel amenities, but a higher temperature (0.7) when generating creative travel itinerary suggestions.
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/includes/3-retrieval-augmented-generation.md
+13-7Lines changed: 13 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,21 +6,25 @@ To address this challenge, you can **ground** the model by providing it with rel
6
6
7
7
When you use a language model without grounding, the only information it has comes from its training data. The result might be grammatically correct and logically structured, but it can be inaccurate or include fabricated details. For example, asking "Which hotels do you offer in Paris?" without grounding data might return fictional hotel names.
8
8
9
+
:::image type="content" source="../media/ungrounded.png" alt-text="Diagram showing an ungrounded model returning an uncontextualized response based only on training data.":::
10
+
9
11
When you **ground** a prompt, you provide relevant data from a trusted source along with the user's question. The model then generates a response based on that data, producing more accurate and contextually relevant answers.
10
12
11
13
Consider the difference:
12
14
13
15
-**Ungrounded**: The model relies only on its training data and might invent hotel names or details.
14
16
-**Grounded**: The model receives your actual hotel catalog data as context and responds with real hotel names, prices, and availability.
15
17
18
+
:::image type="content" source="../media/grounded.png" alt-text="Diagram comparing an ungrounded model returning generic responses versus a grounded model returning data-backed responses.":::
19
+
16
20
Grounding improves the factual accuracy of responses by connecting the model to information that is specific, current, and relevant to the user's needs.
17
21
18
22
## How RAG works
19
23
20
-
<!-- TODO: Diagram illustrating the RAG pattern flow: (1) User submits a question, (2) the question is used to search a data source/index, (3) retrieved documents are added to the prompt, (4) the augmented prompt is sent to the language model, (5) a grounded response is returned. Similar to the rag-pattern.png in the build-copilot module. -->
21
-
22
24
RAG is a pattern that retrieves relevant information from a data source and includes it in the prompt before the model generates a response. The process follows three steps:
23
25
26
+
:::image type="content" source="../media/rag-pattern.png" alt-text="Diagram showing the three-step RAG pattern: retrieve grounding data, augment the prompt with that data, and generate a grounded response.":::
27
+
24
28
1.**Retrieve**: Search a data source for information that is relevant to the user's question.
25
29
1.**Augment**: Add the retrieved information to the prompt as context.
26
30
1.**Generate**: Send the augmented prompt to the language model to generate a grounded response.
@@ -31,7 +35,7 @@ By retrieving context from a specified data source, you ensure that the model us
31
35
32
36
A critical component of RAG is the ability to efficiently find the most relevant information in your data source. This is where **embeddings** and **vector search** come in.
33
37
34
-
An **embedding** is a mathematical representation of text as a vector—a list of floating-point numbers that captures the meaning of words, sentences, or documents. You create embeddings by sending your content to an embedding model, such as an Azure OpenAI embedding model available in Microsoft Foundry.
38
+
An **embedding** is a mathematical representation of text as a vector — a list of floating-point numbers that captures the meaning of words, sentences, or documents. You create embeddings by sending your content to an embedding model, such as an Azure OpenAI embedding model available in Microsoft Foundry.
35
39
36
40
For example, imagine two documents:
37
41
@@ -40,14 +44,16 @@ For example, imagine two documents:
40
44
41
45
These sentences use different words but have similar meanings. When you create embeddings for each, their vectors are close together in multidimensional space, reflecting their semantic similarity.
42
46
47
+
:::image type="content" source="../media/vector-embeddings.jpg" alt-text="Diagram showing text keywords plotted as vectors in multidimensional space, with the distance between vectors representing semantic similarity.":::
48
+
43
49
**Cosine similarity** measures how close two vectors are by calculating the angle between them. A value near 1 means the vectors are very similar. This mathematical approach enables you to find relevant documents even when the exact words don't match.
44
50
45
51
## Use Azure AI Search for retrieval
46
52
47
-
<!-- TODO: Screenshot of the Add Data dialog in the Microsoft Foundry portal, showing the available data source options (Azure Blob Storage, Azure Data Lake Storage Gen2, Microsoft OneLake, file upload). -->
48
-
49
53
**Azure AI Search** provides the retrieval component for RAG solutions in Microsoft Foundry. It allows you to bring your own data, create a searchable index, and query it to retrieve relevant information.
50
54
55
+
:::image type="content" source="../media/index.png" alt-text="Diagram showing an Azure AI Search index being queried to retrieve grounding data for a user question.":::
56
+
51
57
To use Azure AI Search with RAG, you:
52
58
53
59
1.**Add your data** to Microsoft Foundry from sources like Azure Blob Storage, Azure Data Lake Storage Gen2, or Microsoft OneLake. You can also upload files directly.
@@ -101,7 +107,7 @@ RAG is most effective when:
101
107
-**Factual accuracy is critical**: You need responses grounded in real data rather than the model's general knowledge.
102
108
-**The base model's training data has a cutoff**: Events or information that occurred after the model's training cutoff date need to be accessible.
103
109
104
-
For the travel agency scenario, RAG allows customers to ask questions about specific hotels, destinations, and booking policies—all grounded in the agency's actual catalog data.
110
+
For the travel agency scenario, RAG allows customers to ask questions about specific hotels, destinations, and booking policies, all grounded in the agency's actual catalog data.
105
111
106
112
> [!TIP]
107
-
> If you're building agents that need grounded knowledge without managing your own search infrastructure, consider **Foundry IQ** — a managed knowledge store that simplifies grounding for AI agents. To learn more, see [YOURPLACEHOLDER]().
113
+
> If you're building agents that need grounded knowledge without managing your own search infrastructure, consider **Foundry IQ** — a managed knowledge store that simplifies grounding for AI agents. To learn more, see [Build knowledge-enhanced AI agents with Foundry IQ](/training/modules/introduction-foundry-iq/?azure-portal=true).
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/optimize-generative-ai-model-performance/includes/5-compare-combine-strategies.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
Now that you've explored prompt engineering, RAG, and fine-tuning individually, let's look at how they relate to each other. These strategies aren't mutually exclusive—they're complementary methods that you can combine to meet different optimization goals.
1
+
Now that you've explored prompt engineering, RAG, and fine-tuning individually, let's look at how they relate to each other. These strategies aren't mutually exclusive; they're complementary methods that you can combine to meet different optimization goals.
0 commit comments