Skip to content

Commit 971e035

Browse files
committed
update
1 parent 27d6609 commit 971e035

3 files changed

Lines changed: 27 additions & 6 deletions

File tree

learn-pr/wwl-data-ai/optimize-finetune-agents/includes/4-prepare-data.md

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
You've identified which fine-tuning method addresses your agent quality challenge. Now prepare training data that actually works. Quality training data determines whether fine-tuning succeeds or wastes resources. Data preparation has two critical validation stages: ensuring your data follows the correct format for your chosen method, and verifying the content quality meets standards for effective training.
1+
You identified which fine-tuning method addresses your agent quality challenge. Now prepare training data that actually works. Quality training data determines whether fine-tuning succeeds or wastes resources. Data preparation involves three key stages: validating your data follows the correct format for your chosen method, verifying the content quality meets training standards, and creating your dataset using the right tools and workflow.
22

33
## Validate data format
44

@@ -34,7 +34,7 @@ For multi-turn conversations where you want to optimize only specific responses,
3434

3535
DPO requires preference pairs in JSONL format with three top-level fields: `input` (containing the system message and initial user message), `preferred_output` (the better response), and `non_preferred_output` (the worse response).
3636

37-
:::image type="content" source="../media/preference-format.png" alt-text="Diagram showing direct preference optimization data format with input messages, preferred output, and non-preferred output fields." lightbox="../media/preference-format.png":::
37+
:::image type="content" source="../media/preference-format.png" alt-text="Diagram showing direct preference optimization data format with input messages, preferred output, and nonpreferred output fields." lightbox="../media/preference-format.png":::
3838

3939
Each training example separates the prompt (`input`) from two alternative responses (`preferred_output` and `non_preferred_output`):
4040

@@ -53,8 +53,8 @@ Each training example separates the prompt (`input`) from two alternative respon
5353
- Every example has `input`, `preferred_output`, and `non_preferred_output` fields
5454
- `input` contains a `messages` array with system message (optional) and user message
5555
- `preferred_output` contains at least one assistant message demonstrating preferred behavior
56-
- `non_preferred_output` contains at least one assistant message demonstrating non-preferred behavior
57-
- Difference between preferred and non-preferred illustrates the specific quality you're optimizing (tone, safety balance, style)
56+
- `non_preferred_output` contains at least one assistant message demonstrating nonpreferred behavior
57+
- Difference between preferred and nonpreferred illustrates the specific quality you're optimizing (tone, safety balance, style)
5858

5959
# [Reinforcement Fine-Tuning (RFT)](#tab/rft)
6060

@@ -118,11 +118,32 @@ High-quality training data shares predictable characteristics regardless of fine
118118

119119
**Accuracy**: Examples contain factually correct information and appropriate recommendations. One inaccurate training example (like recommending summer gear for winter conditions) can corrupt model behavior across related scenarios. Verify domain correctness before including examples.
120120

121-
**Diversity**: Training data covers the full range of query variations, edge cases, and response scenarios your model will encounter. Adventure Works ensures their safety dataset includes varied experience levels, different seasons, multiple activity types, and diverse geographic contexts (not just summer hiking in one region).
121+
**Diversity**: Training data covers the full range of query variations, edge cases, and response scenarios your model can encounter. Adventure Works ensures their safety dataset includes varied experience levels, different seasons, multiple activity types, and diverse geographic contexts (not just summer hiking in one region).
122122

123-
**Clarity**: Each example unambiguously demonstrates one desired behavior. Avoid examples where the "correct" response requires subjective interpretation or where multiple valid approaches exist unless you're using preference pairs (DPO) that explicitly show which approach you prefer.
123+
**Clarity**: Each example unambiguously demonstrates one desired behavior. Avoid examples where the "correct" response requires subjective interpretation or where multiple valid approaches exist. Unless you're using preference pairs (DPO) that explicitly show which approach you prefer.
124124

125125
**Representativeness**: Training data distribution matches real-world usage patterns. If 40% of Adventure Works queries ask about waterproof ratings, but waterproof examples represent only 5% of training data, the model underperforms on a frequent use case.
126126

127127
> [!TIP]
128128
> **Quality over quantity**: 100 high-quality diverse examples outperform 500 mediocre examples. Start with your best 50-100 examples, fine-tune, evaluate, then decide whether to expand volume or improve existing examples based on failure analysis.
129+
130+
## Create your dataset
131+
132+
Follow this systematic workflow to create training data that meets format requirements and quality standards.
133+
134+
:::image type="content" source="../media/workflow-data.png" alt-text="Diagram showing the five-step workflow for creating training datasets: choose strategy, prepare materials, generate examples, validate, and monitor performance." lightbox="../media/workflow-data.png":::
135+
136+
1. **Choose your data acquisition strategy**: Real data works when you have documented interactions. Synthetic generation works when examples are scarce or contain sensitive information. Hybrid combines both, for example you can use real data for common scenarios, synthetic for edge cases.
137+
138+
2. **Prepare your source materials**: For synthetic data: clean PDFs/markdown for Q&A generation or OpenAPI specs for tool use generation. For real data: gather chat logs, support tickets, or documented interactions. Remove unnecessary formatting and marketing content.
139+
140+
3. **Generate or curate training examples**: Use Foundry's synthetic data generators (Simple Q&A or Tool use) to create JSONL examples, or manually structure real interactions into JSONL format.
141+
142+
4. **Validate format and quality**: Verify format compliance and quality principles. Review synthetic examples for incorrect information. Start with 50-100 examples, validate thoroughly, then scale.
143+
144+
5. **Monitor and audit model performance**: Evaluate model benchmarks and metrics on validation data after fine-tuning. If performance falls short, analyze failures to determine whether you need more, better, or different examples.
145+
146+
> [!TIP]
147+
> Learn more about how to [generate synthetic data for fine-tuning with Microsoft Foundry](/azure/ai-foundry/fine-tuning/data-generation?view=foundry#generate-synthetic-data-for-fine-tuning)
148+
149+
With properly formatted, high-quality training data created, you're ready to learn how to optimize fine-tuning a model.
-23.3 KB
Loading
154 KB
Loading

0 commit comments

Comments
 (0)