You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Understand when to fine-tune a language model
4
4
metadata:
5
5
title: Understand when to fine-tune a language model
6
-
description: Learn how to optimize a language model, and when to fine-tune a language model.
6
+
description: Learn how to optimize a language model and explore fine-tuning techniques including supervised fine-tuning, reinforcement fine-tuning, and direct preference optimization.
title: Prepare your data to fine-tune a chat completion model
4
4
metadata:
5
5
title: Prepare your data to fine-tune a chat completion model
6
-
description: Prepare your data to fine-tune a chat completion model in the Microsoft Foundry portal.
6
+
description: Learn how to prepare training data from real conversations or generate synthetic data using Microsoft Foundry's data generation tools for fine-tuning chat completion models.
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/finetune-model-copilot-ai-studio/includes/2-understand-finetune.md
+77-1Lines changed: 77 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,17 @@
1
-
Before you start fine-tuning a model, you need to have a clear understanding of what fine-tuning is and when you should use it.
1
+
Fine-tuning a language model gives you greater control over how your model behaves, helping you achieve consistent responses in a specific style, format, and tone. Here you learn when to use fine-tuning and you explore five key fine-tuning techniques:
2
+
3
+
Starting with the most basic approach:
4
+
- Supervised fine-tuning for training with labeled examples
5
+
6
+
And four more advanced techniques:
7
+
- Function calling fine-tuning for structured output and API integration
8
+
- Vision fine-tuning for image understanding tasks
9
+
- Reinforcement fine-tuning for reward-based learning
10
+
- Direct preference optimization for alignment using preference pairs
11
+
12
+
Let's start by comparing fine-tuning to other optimization techniques for models and agents.
13
+
14
+
## Understand when to use fine-tuning
2
15
3
16
When you want to develop a chat application with Microsoft Foundry, you can use prompt flow to create a chat application that is integrated with a language model to generate responses. To improve the quality of the responses the model generates, you can try various strategies. The easiest strategy is to apply **prompt engineering**. You can change the way you format your question, but you can also update the **system message** that is sent along with the prompt to the language model.
4
17
@@ -17,3 +30,66 @@ Within prompt engineering, a technique used to *"force"* the model to generate o
17
30
18
31
To maximize the **consistency of the model's behavior**, you can **fine-tune a base model** with your own training data.
19
32
33
+
## Explore fine-tuning techniques
34
+
35
+
Microsoft Foundry supports multiple fine-tuning techniques, each designed for different use cases and model capabilities:
36
+
37
+
### Apply supervised fine-tuning
38
+
39
+
**Supervised fine-tuning** is the most basic and common approach where you train a base model on labeled example data. You provide the model with example conversations that demonstrate the desired behavior, including system messages, user prompts, and assistant responses. This technique is ideal for teaching the model specific formats, styles, tones, or domain-specific behaviors.
40
+
41
+
Supervised fine-tuning is supported for models like GPT-4, GPT-4o, GPT-3.5-Turbo, and many other foundation models in the model catalog. This is the recommended starting point for most fine-tuning scenarios.
42
+
43
+
### Implement reinforcement fine-tuning
44
+
45
+
**Reinforcement fine-tuning (RFT)** is an advanced technique that improves reasoning models by training them through a reward-based process, rather than relying only on labeled data. Instead of providing example responses, you provide prompts and a grader that scores the quality of the model's outputs. The model learns to generate better responses by maximizing the reward signal.
46
+
47
+
RFT is especially useful for:
48
+
49
+
- Complex reasoning and problem-solving tasks
50
+
- Scenarios where labeled examples are limited
51
+
- Cases where you want the model to develop sophisticated reasoning strategies
52
+
53
+
RFT is supported for advanced reasoning models like o4-mini and gpt-5. When using RFT, you need to define a grader (such as text comparison, model-based, or custom code graders) that evaluates the model's outputs during training.
54
+
55
+
### Align with direct preference optimization
56
+
57
+
**Direct preference optimization (DPO)** is an advanced alignment technique that adjusts model weights based on human preferences. Instead of providing single example responses, you provide pairs of responses - one preferred and one nonpreferred - for each prompt. The model learns to generate outputs more similar to the preferred examples.
58
+
59
+
DPO is especially useful when:
60
+
61
+
- There's no clear-cut correct answer
62
+
- Subjective elements like tone, style, or content preferences are important
63
+
- You have preference data from user logs, A/B tests, or manual annotations
64
+
- You want a computationally lighter alternative to reinforcement learning from human feedback (RLHF)
65
+
66
+
DPO is supported for models like gpt-4o, gpt-4.1, and gpt-4.1-mini. You can use DPO with base models or with models already fine-tuned using supervised fine-tuning.
67
+
68
+
### Fine-tune for function calling
69
+
70
+
**Function calling fine-tuning** is an advanced technique that trains models to reliably call external functions or APIs with structured arguments. You provide training examples that demonstrate how the model should respond to user requests by calling specific functions with the correct parameters. This technique teaches the model when and how to use tools, improving its ability to generate structured output and integrate with external systems.
71
+
72
+
Function calling fine-tuning is especially useful for:
73
+
74
+
- Building agents that need to interact with APIs or databases
75
+
- Ensuring consistent JSON schema adherence
76
+
- Teaching domain-specific function usage patterns
77
+
- Reducing errors in parameter extraction and function selection
78
+
79
+
Function calling fine-tuning is supported for models like GPT-4o and GPT-4o-mini. You can combine this with supervised fine-tuning to teach both conversational behavior and function calling capabilities.
80
+
81
+
### Fine-tune for vision tasks
82
+
83
+
**Vision fine-tuning** is an advanced technique that enhances models' ability to understand and reason about images. You provide training examples that pair images with text prompts and expected responses, teaching the model to recognize specific visual patterns, objects, or concepts relevant to your domain. This technique is ideal for specialized computer vision applications where general-purpose vision models need domain-specific understanding.
84
+
85
+
Vision fine-tuning is especially useful for:
86
+
87
+
- Medical imaging analysis with specialized terminology
88
+
- Industrial quality control and defect detection
89
+
- Document understanding with custom layouts or formats
90
+
- Domain-specific image classification and captioning
91
+
92
+
Vision fine-tuning is supported for multimodal models like GPT-4o. You can fine-tune both the vision and language understanding capabilities together to create models tailored to your specific visual tasks.
93
+
94
+
Now that you understand the options, let's explore what to consider when preparing your data for fine-tuning workloads.
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/finetune-model-copilot-ai-studio/includes/3-prepare-data.md
+54-2Lines changed: 54 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,9 @@
1
-
Fine-tuning involves combining a suitable *foundation* model to use as a base, and with a set of *training data* that includes example prompts and responses that the model can learn from.
1
+
Fine-tuning involves combining a suitable *foundation* model to use as a base, and with a set of *training data* that includes example prompts and responses that the model can learn from. This unit covers understanding the training data format, preparing datasets from real data, and generating synthetic data for fine-tuning.
2
2
3
3

4
4
5
+
## Understand the training data format
6
+
5
7
When you decide you want to fine-tune a language model, you need to identify the dataset you can use to fine-tune your language model.
6
8
7
9
Similar to any machine learning model, the quality of the dataset has a large effect on the quality of your model. Though you need less data than when you would train a language model from scratch, you still might need enough data to maximize the consistency of your desired model's behavior. How much data you need depends on your use case.
@@ -18,6 +20,8 @@ The three variables come together in a JSON Lines or JSONL file. For example, on
18
20
{"messages": [{"role": "system", "content": "You are an Xbox customer support agent whose primary goal is to help users with issues they are experiencing with their Xbox devices. You are friendly and concise. You only provide factual answers to queries, and do not provide answers that are not related to Xbox."}, {"role": "user", "content": "Is Xbox better than PlayStation?"}, {"role": "assistant", "content": "I apologize, but I cannot provide personal opinions. My primary job is to assist you with any issues related to your Xbox device. Do you have any Xbox-related issues that need addressing?"}]}
19
21
```
20
22
23
+
## Prepare datasets from real data
24
+
21
25
The dataset should show the model's ideal behavior. You can create this dataset based on the chat history of a chat application you have. A few things to keep in mind when you use real data is to:
22
26
23
27
- Remove any personal or sensitive information.
@@ -31,4 +35,52 @@ An example of a multi-turn chat file format with weights:
31
35
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris", "weight": 0}, {"role": "user", "content": "Can you be more sarcastic?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already.", "weight": 1}]}
32
36
```
33
37
34
-
When preparing your dataset to fine-tune a language model, you should understand your desired model behaviors, create a dataset in JSONL format, and ensure the examples you include are high quality and diverse. By preparing your dataset, you have a higher chance that the fine-tuned model improves your chat application's performance.
38
+
## Generate synthetic data for fine-tuning
39
+
40
+
Creating high-quality training datasets manually can be time-consuming and expensive. Microsoft Foundry provides **synthetic data generation** capabilities that can help you create training data from reference documents or API specifications.
41
+
42
+
Synthetic data generation is especially useful when:
43
+
44
+
- Real data is scarce or difficult to collect
45
+
- You need to preserve privacy while retaining useful structure
46
+
- You want to generate domain-specific data from existing documents or code
47
+
- You want to reduce the cost compared to manual data collection
48
+
49
+
### Choose synthetic data generators
50
+
51
+
Microsoft Foundry offers two types of synthetic data generators:
52
+
53
+
**Simple Q&A Generator**: Converts domain documents (PDF, Markdown, or plain text up to 20 MB) into fine-tuning question-answer pairs. You can configure the question type:
54
+
55
+
-**Long answer**: Generates questions that require analytical reasoning in the model response
56
+
-**Short answer**: Generates questions focused on factual brevity
57
+
58
+
**Tool use Generator**: Creates multi-turn conversations with tool calls based on your API surface. Requires a valid OpenAPI 3.0.x or 3.1.x specification (JSON format, up to 20MB) that describes the APIs you want the model to learn to call as tools.
59
+
60
+
### Generate synthetic data in the portal
61
+
62
+
To generate synthetic data in Microsoft Foundry:
63
+
64
+
1. Navigate to **Data > Synthetic Data Generation** in the portal
65
+
1. Select **Generate data**
66
+
1. Choose a task type (Simple Q&A or Tool use)
67
+
1. Configure generation parameters if applicable
68
+
1. Upload your reference file
69
+
1. Specify the number of samples to generate (between 50 and 1,000)
70
+
1. Select the model to use for data generation
71
+
1. Optionally enable an 80/20 train-validation split
72
+
1. Submit the job
73
+
74
+
The generated dataset is automatically formatted as JSONL files compatible with Microsoft Foundry fine-tuning workflows. You can download the dataset for review or use it directly in fine-tuning.
75
+
76
+
### Apply best practices for synthetic data
77
+
78
+
To get the best results when using synthetic data:
79
+
80
+
-**Use high-quality reference files**: The quality of your reference file directly impacts the generated data. Use relevant, well-structured documents with clear formatting and avoid excessive noise or irrelevant information.
81
+
-**Start small and iterate**: Begin with a smaller sample size to evaluate quality, then scale up after reviewing and refining your approach.
82
+
-**Combine with real data**: When possible, mix synthetic data with real-world examples to improve model performance and generalization.
83
+
-**Experiment with hyperparameters**: When fine-tuning on synthetic data, you need to adjust learning rates and other parameters differently than with real data to avoid overfitting.
84
+
-**Monitor performance**: Regularly evaluate your fine-tuned model on real-world tasks to ensure it meets your requirements.
85
+
86
+
When preparing your dataset to fine-tune a language model, you should understand your desired model behaviors, create a dataset in JSONL format (manually or through synthetic generation), and ensure the examples you include are high quality and diverse. By preparing your dataset carefully, you have a higher chance that the fine-tuned model improves your chat application's performance.
0 commit comments