You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/optimize-finetune-agents/includes/4-prepare-data.md
+27-6Lines changed: 27 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
You've identified which fine-tuning method addresses your agent quality challenge. Now prepare training data that actually works. Quality training data determines whether fine-tuning succeeds or wastes resources. Data preparation has two critical validation stages: ensuring your data follows the correct format for your chosen method, and verifying the content quality meets standards for effective training.
1
+
You identified which fine-tuning method addresses your agent quality challenge. Now prepare training data that actually works. Quality training data determines whether fine-tuning succeeds or wastes resources. Data preparation involves three key stages: validating your data follows the correct format for your chosen method, verifying the content quality meets training standards, and creating your dataset using the right tools and workflow.
2
2
3
3
## Validate data format
4
4
@@ -34,7 +34,7 @@ For multi-turn conversations where you want to optimize only specific responses,
34
34
35
35
DPO requires preference pairs in JSONL format with three top-level fields: `input` (containing the system message and initial user message), `preferred_output` (the better response), and `non_preferred_output` (the worse response).
36
36
37
-
:::image type="content" source="../media/preference-format.png" alt-text="Diagram showing direct preference optimization data format with input messages, preferred output, and non-preferred output fields." lightbox="../media/preference-format.png":::
37
+
:::image type="content" source="../media/preference-format.png" alt-text="Diagram showing direct preference optimization data format with input messages, preferred output, and nonpreferred output fields." lightbox="../media/preference-format.png":::
38
38
39
39
Each training example separates the prompt (`input`) from two alternative responses (`preferred_output` and `non_preferred_output`):
40
40
@@ -53,8 +53,8 @@ Each training example separates the prompt (`input`) from two alternative respon
53
53
- Every example has `input`, `preferred_output`, and `non_preferred_output` fields
54
54
-`input` contains a `messages` array with system message (optional) and user message
55
55
-`preferred_output` contains at least one assistant message demonstrating preferred behavior
56
-
-`non_preferred_output` contains at least one assistant message demonstrating non-preferred behavior
57
-
- Difference between preferred and non-preferred illustrates the specific quality you're optimizing (tone, safety balance, style)
56
+
-`non_preferred_output` contains at least one assistant message demonstrating nonpreferred behavior
57
+
- Difference between preferred and nonpreferred illustrates the specific quality you're optimizing (tone, safety balance, style)
58
58
59
59
# [Reinforcement Fine-Tuning (RFT)](#tab/rft)
60
60
@@ -118,11 +118,32 @@ High-quality training data shares predictable characteristics regardless of fine
118
118
119
119
**Accuracy**: Examples contain factually correct information and appropriate recommendations. One inaccurate training example (like recommending summer gear for winter conditions) can corrupt model behavior across related scenarios. Verify domain correctness before including examples.
120
120
121
-
**Diversity**: Training data covers the full range of query variations, edge cases, and response scenarios your model will encounter. Adventure Works ensures their safety dataset includes varied experience levels, different seasons, multiple activity types, and diverse geographic contexts (not just summer hiking in one region).
121
+
**Diversity**: Training data covers the full range of query variations, edge cases, and response scenarios your model can encounter. Adventure Works ensures their safety dataset includes varied experience levels, different seasons, multiple activity types, and diverse geographic contexts (not just summer hiking in one region).
122
122
123
-
**Clarity**: Each example unambiguously demonstrates one desired behavior. Avoid examples where the "correct" response requires subjective interpretation or where multiple valid approaches exist unless you're using preference pairs (DPO) that explicitly show which approach you prefer.
123
+
**Clarity**: Each example unambiguously demonstrates one desired behavior. Avoid examples where the "correct" response requires subjective interpretation or where multiple valid approaches exist. Unless you're using preference pairs (DPO) that explicitly show which approach you prefer.
124
124
125
125
**Representativeness**: Training data distribution matches real-world usage patterns. If 40% of Adventure Works queries ask about waterproof ratings, but waterproof examples represent only 5% of training data, the model underperforms on a frequent use case.
126
126
127
127
> [!TIP]
128
128
> **Quality over quantity**: 100 high-quality diverse examples outperform 500 mediocre examples. Start with your best 50-100 examples, fine-tune, evaluate, then decide whether to expand volume or improve existing examples based on failure analysis.
129
+
130
+
## Create your dataset
131
+
132
+
Follow this systematic workflow to create training data that meets format requirements and quality standards.
133
+
134
+
:::image type="content" source="../media/workflow-data.png" alt-text="Diagram showing the five-step workflow for creating training datasets: choose strategy, prepare materials, generate examples, validate, and monitor performance." lightbox="../media/workflow-data.png":::
135
+
136
+
1.**Choose your data acquisition strategy**: Real data works when you have documented interactions. Synthetic generation works when examples are scarce or contain sensitive information. Hybrid combines both, for example you can use real data for common scenarios, synthetic for edge cases.
137
+
138
+
2.**Prepare your source materials**: For synthetic data: clean PDFs/markdown for Q&A generation or OpenAPI specs for tool use generation. For real data: gather chat logs, support tickets, or documented interactions. Remove unnecessary formatting and marketing content.
139
+
140
+
3.**Generate or curate training examples**: Use Foundry's synthetic data generators (Simple Q&A or Tool use) to create JSONL examples, or manually structure real interactions into JSONL format.
141
+
142
+
4.**Validate format and quality**: Verify format compliance and quality principles. Review synthetic examples for incorrect information. Start with 50-100 examples, validate thoroughly, then scale.
143
+
144
+
5.**Monitor and audit model performance**: Evaluate model benchmarks and metrics on validation data after fine-tuning. If performance falls short, analyze failures to determine whether you need more, better, or different examples.
145
+
146
+
> [!TIP]
147
+
> Learn more about how to [generate synthetic data for fine-tuning with Microsoft Foundry](/azure/ai-foundry/fine-tuning/data-generation?view=foundry#generate-synthetic-data-for-fine-tuning)
148
+
149
+
With properly formatted, high-quality training data created, you're ready to learn how to optimize fine-tuning a model.
0 commit comments