You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Introduction to fine-tuning a foundation model from the model catalog in Microsoft Foundry.
6
+
description: Introduction to advanced fine-tuning decision-making for AI agents, covering method selection, synthetic data strategies, performance assessment, and model lifecycle management.
title: Prepare your data to fine-tune a chat completion model
3
+
title: Prepare training data for fine-tuning
4
4
metadata:
5
-
title: Prepare your data to fine-tune a chat completion model
6
-
description: Learn how to prepare training data from real conversations or generate synthetic data using Microsoft Foundry's data generation tools for fine-tuning chat completion models.
5
+
title: Prepare training data for fine-tuning
6
+
description: Learn how to validate data formats for SFT, DPO, and RFT fine-tuning methods, apply data quality principles, and follow a systematic workflow to create or generate high-quality training datasets.
title: Explore fine-tuning language models in Microsoft Foundry portal
3
+
title: Design your optimization strategy and configure training hyperparameters
4
4
metadata:
5
-
title: Explore fine-tuning foundation models in Microsoft Foundry portal
6
-
description: Explore fine-tuning foundation models from the model catalog in Microsoft Foundry portal.
5
+
title: Design your optimization strategy and configure training hyperparameters
6
+
description: Learn how to design an optimization strategy by evaluating baseline performance, setting measurable targets, splitting your dataset, and configuring hyperparameters for SFT, DPO, and RFT training jobs.
description: Knowledge check about fine-tuning a language model.
6
+
description: Knowledge check covering fine-tuning method selection for agent quality problems, DPO preference pair data format requirements, and optimization strategy design.
7
7
author: madiepev
8
8
ms.author: madiepev
9
9
ms.date: 02/25/2026
@@ -15,38 +15,38 @@ durationInMinutes: 3
15
15
content: |
16
16
quiz:
17
17
questions:
18
-
- content: "How must data be formatted for fine-tuning?"
18
+
- content: "An AI agent generates trip plans that fail to consider how customer constraints like fitness level, experience, budget, and weather conditions interact with each other. Which fine-tuning method best addresses this problem?"
19
19
choices:
20
-
- content: "JSONL"
21
-
isCorrect: true
22
-
explanation: "Correct. JSONL or JSON Lines is the expected input format."
23
-
- content: "YAML"
20
+
- content: "Supervised Fine-Tuning (SFT)"
24
21
isCorrect: false
25
-
explanation: "Incorrect. In the context of the Microsoft Foundry, YAML is used to specify the configuration of jobs or flows."
26
-
- content: "HTML"
22
+
explanation: "Incorrect. SFT excels at format consistency and style adherence, but doesn't develop the multi-step reasoning needed to balance interacting constraints."
23
+
- content: "Reinforcement Fine-Tuning (RFT)"
24
+
isCorrect: true
25
+
explanation: "Correct. RFT develops complex reasoning capabilities by using a grader function to reward outputs that appropriately balance interacting constraints, making it the right choice for multi-step planning problems."
26
+
- content: "Direct Preference Optimization (DPO)"
27
27
isCorrect: false
28
-
explanation: "Incorrect. HTML is a markup language for web pages."
29
-
- content: "What does fine-tuning optimize in your model?"
28
+
explanation: "Incorrect. DPO specializes in subjective preferences and tone alignment, not in developing reasoning logic for multi-step constraint balancing."
29
+
- content: "Which fine-tuning method requires training data structured as preference pairs, each containing a prompt alongside both a preferred and a non-preferred response?"
30
30
choices:
31
-
- content: "What the model needs to know."
31
+
- content: "Supervised Fine-Tuning (SFT)"
32
32
isCorrect: false
33
-
explanation: "Incorrect. To optimize what the model needs to know, RAG is more effective."
34
-
- content: "How the model needs to act."
35
-
isCorrect: true
36
-
explanation: "Correct. Fine-tuning can help to maximize the consistency of the model's behavior."
37
-
- content: "Which words aren't allowed."
33
+
explanation: "Incorrect. SFT requires labeled examples, where each example pairs a prompt with the single ideal response you want the model to produce."
34
+
- content: "Reinforcement Fine-Tuning (RFT)"
38
35
isCorrect: false
39
-
explanation: "Incorrect. To filter specific content like words, you can use a content filter."
40
-
- content: "Which advanced option refers to one full cycle through the training dataset?"
36
+
explanation: "Incorrect. RFT requires prompts combined with a separate grader function that scores the model's generated responses during training."
37
+
- content: "Direct Preference Optimization (DPO)"
38
+
isCorrect: true
39
+
explanation: "Correct. DPO uses preference pairs to teach the model which response better reflects your values. Each example includes a prompt, a preferred response, and a non-preferred response."
40
+
- content: "What is the purpose of evaluating the base model before submitting a fine-tuning job?"
41
41
choices:
42
-
- content: "seed"
42
+
- content: "To establish a baseline so you can measure whether fine-tuning improved performance."
43
+
isCorrect: true
44
+
explanation: "Correct. Running your evaluation dataset through the unmodified base model and recording metric scores gives you a reference point for comparing fine-tuned results."
45
+
- content: "To automatically generate labeled training examples from the base model's outputs."
43
46
isCorrect: false
44
-
explanation: "Incorrect. The seed controls the reproducibility of the job."
45
-
- content: "batch_size"
47
+
explanation: "Incorrect. Evaluating the base model measures its current performance—it doesn't produce training data. Training data must be prepared separately before fine-tuning begins."
48
+
- content: "To determine the correct number of epochs to use during training."
46
49
isCorrect: false
47
-
explanation: "Incorrect. The batch size specifies the number of training examples used."
48
-
- content: "n_epochs"
49
-
isCorrect: true
50
-
explanation: "Correct. When you set the number of epochs, you set the number of full cycles to run through the dataset."
50
+
explanation: "Incorrect. Baseline evaluation measures current model quality against your metrics. Epoch count is a hyperparameter you set based on training results, adjusted one at a time after evaluating each run."
description: Summary of fine-tuning language models.
6
+
description: Summary of advanced fine-tuning for AI agents, covering method selection, training data preparation, optimization strategy design, and hyperparameter configuration.
0 commit comments