Skip to content

Commit 27d6609

Browse files
committed
add data quality image and lightbox
1 parent c54ec3d commit 27d6609

2 files changed

Lines changed: 5 additions & 3 deletions

File tree

learn-pr/wwl-data-ai/optimize-finetune-agents/includes/4-prepare-data.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Different fine-tuning methods require different data structures. Before training
88

99
SFT requires conversations in JSONL (JSON Lines) format with **system** instructions, **user** prompts, and ideal **assistant** responses.
1010

11-
:::image type="content" source="../media/supervised-format.png" alt-text="Diagram showing supervised fine-tuning data format with messages array containing system, user, and assistant roles.":::
11+
:::image type="content" source="../media/supervised-format.png" alt-text="Diagram showing supervised fine-tuning data format with messages array containing system, user, and assistant roles." lightbox="../media/supervised-format.png":::
1212

1313
Each training example contains a `messages` array where roles alternate between user and assistant:
1414

@@ -34,7 +34,7 @@ For multi-turn conversations where you want to optimize only specific responses,
3434

3535
DPO requires preference pairs in JSONL format with three top-level fields: `input` (containing the system message and initial user message), `preferred_output` (the better response), and `non_preferred_output` (the worse response).
3636

37-
:::image type="content" source="../media/preference-format.png" alt-text="Diagram showing direct preference optimization data format with input messages, preferred output, and non-preferred output fields.":::
37+
:::image type="content" source="../media/preference-format.png" alt-text="Diagram showing direct preference optimization data format with input messages, preferred output, and non-preferred output fields." lightbox="../media/preference-format.png":::
3838

3939
Each training example separates the prompt (`input`) from two alternative responses (`preferred_output` and `non_preferred_output`):
4040

@@ -60,7 +60,7 @@ Each training example separates the prompt (`input`) from two alternative respon
6060

6161
RFT requires prompts in JSONL format plus a separate grader function that scores model responses. The model learns through reinforcement by generating responses and receiving reward scores from the grader. Training data can include optional fields (like `ideal_itinerary`) that graders access for scoring.
6262

63-
:::image type="content" source="../media/reinforcement-format.png" alt-text="Diagram showing reinforcement fine-tuning data format with messages array and optional fields for grader access.":::
63+
:::image type="content" source="../media/reinforcement-format.png" alt-text="Diagram showing reinforcement fine-tuning data format with messages array and optional fields for grader access." lightbox="../media/reinforcement-format.png":::
6464

6565
Each training example includes a `messages` array ending with a user message, plus optional fields that the grader uses for scoring:
6666

@@ -110,6 +110,8 @@ This text similarity grader compares the model's generated itinerary against the
110110

111111
Correctly formatted data can still fail if the content lacks quality. After confirming format compliance, validate that your queries, responses, and ground truths meet these quality standards.
112112

113+
:::image type="content" source="../media/data-quality.png" alt-text="Diagram showing the five quality principles for training data: consistency, accuracy, diversity, clarity, and representativeness.":::
114+
113115
High-quality training data shares predictable characteristics regardless of fine-tuning method. Evaluate your dataset against these five principles:
114116

115117
**Consistency**: Every example demonstrates the exact behavior you want to reinforce. Adventure Works reviews each gear specification response to ensure it follows their standard format: technical specs, pricing, availability, and complementary suggestions appear in consistent order. Mixed formats in training data produce mixed formats in outputs.
132 KB
Loading

0 commit comments

Comments
 (0)