You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/optimize-finetune-agents/includes/4-prepare-data.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ Different fine-tuning methods require different data structures. Before training
8
8
9
9
SFT requires conversations in JSONL (JSON Lines) format with **system** instructions, **user** prompts, and ideal **assistant** responses.
10
10
11
-
:::image type="content" source="../media/supervised-format.png" alt-text="Diagram showing supervised fine-tuning data format with messages array containing system, user, and assistant roles.":::
11
+
:::image type="content" source="../media/supervised-format.png" alt-text="Diagram showing supervised fine-tuning data format with messages array containing system, user, and assistant roles." lightbox="../media/supervised-format.png":::
12
12
13
13
Each training example contains a `messages` array where roles alternate between user and assistant:
14
14
@@ -34,7 +34,7 @@ For multi-turn conversations where you want to optimize only specific responses,
34
34
35
35
DPO requires preference pairs in JSONL format with three top-level fields: `input` (containing the system message and initial user message), `preferred_output` (the better response), and `non_preferred_output` (the worse response).
36
36
37
-
:::image type="content" source="../media/preference-format.png" alt-text="Diagram showing direct preference optimization data format with input messages, preferred output, and non-preferred output fields.":::
37
+
:::image type="content" source="../media/preference-format.png" alt-text="Diagram showing direct preference optimization data format with input messages, preferred output, and non-preferred output fields." lightbox="../media/preference-format.png":::
38
38
39
39
Each training example separates the prompt (`input`) from two alternative responses (`preferred_output` and `non_preferred_output`):
40
40
@@ -60,7 +60,7 @@ Each training example separates the prompt (`input`) from two alternative respon
60
60
61
61
RFT requires prompts in JSONL format plus a separate grader function that scores model responses. The model learns through reinforcement by generating responses and receiving reward scores from the grader. Training data can include optional fields (like `ideal_itinerary`) that graders access for scoring.
62
62
63
-
:::image type="content" source="../media/reinforcement-format.png" alt-text="Diagram showing reinforcement fine-tuning data format with messages array and optional fields for grader access.":::
63
+
:::image type="content" source="../media/reinforcement-format.png" alt-text="Diagram showing reinforcement fine-tuning data format with messages array and optional fields for grader access." lightbox="../media/reinforcement-format.png":::
64
64
65
65
Each training example includes a `messages` array ending with a user message, plus optional fields that the grader uses for scoring:
66
66
@@ -110,6 +110,8 @@ This text similarity grader compares the model's generated itinerary against the
110
110
111
111
Correctly formatted data can still fail if the content lacks quality. After confirming format compliance, validate that your queries, responses, and ground truths meet these quality standards.
112
112
113
+
:::image type="content" source="../media/data-quality.png" alt-text="Diagram showing the five quality principles for training data: consistency, accuracy, diversity, clarity, and representativeness.":::
114
+
113
115
High-quality training data shares predictable characteristics regardless of fine-tuning method. Evaluate your dataset against these five principles:
114
116
115
117
**Consistency**: Every example demonstrates the exact behavior you want to reinforce. Adventure Works reviews each gear specification response to ensure it follows their standard format: technical specs, pricing, availability, and complementary suggestions appear in consistent order. Mixed formats in training data produce mixed formats in outputs.
0 commit comments