Skip to content

Commit 3401d02

Browse files
Merge pull request #28950 from MicrosoftDocs/lauragra-issue968
Update minimum number of files for training.
2 parents 9ffe2be + b21b8e8 commit 3401d02

4 files changed

Lines changed: 10 additions & 12 deletions

copilot/copilot-tuning-doc-generation.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ ms.author: jasonjoh
55
manager: calvind
66
ms.audience: ITPro
77
ms.reviewer: jwolk
8-
ms.date: 06/17/2025
8+
ms.date: 07/17/2025
99
ms.topic: how-to
1010
ms.localizationpriority: medium
1111
description: Learn how to use Copilot Tuning to build an AI model for document generation based on organizational knowledge.
@@ -33,9 +33,9 @@ Some example use cases include:
3333

3434
- You must have permission to use Copilot Tuning in Copilot Studio. <!-- TODO: Link to permission doc here if it exists -->
3535
- A collection of original documents and corresponding final draft documents that are stored in SharePoint.
36-
- A collection of change logs or specifications stored in SharePoint.
36+
- A collection of changelogs or specifications stored in SharePoint.
3737
- A structured version of required changes to provide in the supplementary field within Copilot Tuning.
38-
- A minimum of 20 well-aligned pairs of reference documents to target pairs that reflect a representative range of changes you expect the system to handle.
38+
- More than 20 well-aligned pairs of reference documents to target pairs that reflect a representative range of changes you expect the system to handle.
3939

4040
> [!IMPORTANT]
4141
> Document generation supports working with the following file formats: .doc, .docx, .html, .md, or .pdf. Copilot Tuning only uses information found in text. Copilot Tuning document generation doesn't use information in images, tables, or unstructured web content in your documents.
@@ -55,7 +55,7 @@ The following are the high-level steps to configure a custom document generation
5555
5656
### Prepare a mapping file
5757

58-
Your knowledge source should have at least 20 example pairs of original files and corresponding final (draft) files. In this step, you prepare a CSV file that provides at least 20 examples of original files to final (draft) documents. Copilot Tuning uses these examples to fine-tune the generation logic, helping the model learn how your organization typically edits or adapts documents.
58+
Your knowledge source should have more than 20 example pairs of original files and corresponding final (draft) files. In this step, you prepare a CSV file that provides more than 20 examples of original files to final (draft) documents. Copilot Tuning uses these examples to fine-tune the generation logic, helping the model learn how your organization typically edits or adapts documents.
5959

6060
Create a file named **mapping.csv** and store it in the root directory of your knowledge source. This file should have two columns:
6161

copilot/copilot-tuning-expert-qa.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ author: danielabom
44
ms.author: danielabo
55
manager: calvind
66
ms.reviewer: jwolk
7-
ms.date: 06/17/2025
7+
ms.date: 07/16/2025
88
ms.topic: how-to
99
ms.localizationpriority: medium
1010
description: Learn how to use Copilot Tuning to build an AI model for expert question & answer (Q&A).
@@ -29,19 +29,17 @@ You can use the model maker to fine-tune a model that can complete the following
2929

3030
## Limitations
3131

32-
The model supports various document formats, but there are specific limitations to consider when using your content.
32+
The model supports various document formats, with the following specific limitations:
3333

3434
- Content must be stored in SharePoint and be in supported formats (.docx, .pdf, .aspx); elements like embedded images or tables aren't supported.
3535
- Not intended for general productivity or web-wide knowledge queries; it's limited to tenant-specific content and not suitable for tasks like managing meetings or browsing general internet data.
3636
- Depending on the snapshot time of training data, newer content must be enriched via Search.
3737

38-
While there's no minimum document count, better results are achieved with larger content sets; at least 20 documents are recommended for training.
39-
4038
## Prerequisites
4139

4240
Before you start, make sure that you have the following prerequisites in place:
4341

44-
1. You must have domain-specific content or documentation, such as legal playbooks, HR guidelines, technical documentation, policy manuals, or departmental procedures, that the model can use to answer questions.
42+
1. You must have domain-specific content or documentation, such as legal playbooks, HR guidelines, technical documentation, policy manuals, or departmental procedures, that the model can use to answer questions. The content set that you use to train the model must consist of more than 20 files.
4543
2. Configure the model agent with a Microsoft Entra ID security group or distribution list and create your own Entra ID groups to be added to the model.
4644
3. Identify where your content is stored in SharePoint.
4745

copilot/copilot-tuning-knowledge-selection.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ author: msplants
55
ms.author: jwolk
66
manager: calvind
77
ms.topic: concept-article
8-
ms.date: 06/17/2025
8+
ms.date: 07/17/2025
99
ms.localizationpriority: medium
1010
---
1111

@@ -25,7 +25,7 @@ Selecting knowledge is the first and most critical step in Copilot Tuning. You s
2525

2626
- **Supported file formats and content:** All Copilot Tuning tasks support common text-based document formats. You can use Word documents (.doc, .docx), HTML files (.html, .aspx), Markdown files (.md), or PDF files processed by Optical Character Recognition (OCR) as source materials. You can also include Excel documents (.xls, .xlsx) for expert Q&A. The tuning process ingests the text content from these files. It doesn't learn from images, embedded tables, or other non-text elements in the documents. Ensure that the important information in your training documents is in textual form. For example, if a filePDF contains a chart, include a textual explanation of that chart's insights in the document.
2727

28-
- **Number of documents:** You must provide at least 20 samples (documents for Q&A and summarization; input-output pairs for document generation) to Copilot Tuning. Usually hundreds or thousands of samples is ideal, and you can provide a maximum of 10k. The quality of samples is more important than raw quantity. We highly suggest you focus you data preparation time on finding as many high quality samples that are well-aligned with what you expect your fine-tuned model to do.
28+
- **Number of documents:** You must provide more than 20 samples (documents for Q&A and summarization; input-output pairs for document generation) to Copilot Tuning. Usually hundreds or thousands of samples is ideal, and you can provide a maximum of 10k. The quality of samples is more important than raw quantity. We highly suggest you focus you data preparation time on finding as many high quality samples that are well-aligned with what you expect your fine-tuned model to do.
2929

3030
- **Model instructions:** During the model configuration process, Copilot Tuning asks the model maker to provide answers to a series of model instructions to guide the system on how to use the knowledge you selected. Each task type has its own questions about the selected knowledge source. Prepare clear, structured answers to each question. Expert Q&A requires a description of the data in the knowledge source and how it's organized. Document generation requires you to specify how the original input, changes, and output draft document are referred to in your organization. Summarization requires you to specify how to refer to the summaries. It's important that this information is clear and accurately represents your data in order for the system to be most effective.
3131

copilot/copilot-tuning-troubleshooting-mac.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ f1.keywords:
44
ms.author: emrek
55
author: emrekiciman
66
manager: calvind
7-
ms.date: 07/08/2025
7+
ms.date: 07/17/2025
88
audience: Admin
99
ms.topic: troubleshooting
1010
ms.service: microsoft-365-copilot

0 commit comments

Comments
 (0)