Merge pull request #262736 from aahill/docs-editor/use-your-data-1704760862

prmerger-automator[bot] · web-flow · commit e7bdd3879347 · 2024-01-09T19:14:18.000Z
updating system limits
diff --git a/articles/ai-services/openai/concepts/use-your-data.md b/articles/ai-services/openai/concepts/use-your-data.md
@@ -8,7 +8,7 @@ ms.service: azure-ai-openai
 ms.topic: quickstart
 author: aahill
 ms.author: aahi
-ms.date: 11/14/2023
+ms.date: 01/09/2023
 recommendations: false
 ---
 
@@ -337,7 +337,7 @@ Use the following sections to help you configure Azure OpenAI on your data for o
 
 ### System message
 
-Give the model instructions about how it should behave and any context it should reference when generating a response. You can describe the assistant's personality, what it should and shouldn't answer, and how to format responses. There's no token limit for the system message, but will be included with every API call and counted against the overall token limit. The system message will be truncated if it's greater than 400 tokens. 
+Give the model instructions about how it should behave and any context it should reference when generating a response. You can describe the assistant's personality, what it should and shouldn't answer, and how to format responses. There are token limits that apply to the system message, used with every API call, and counted against the overall token limit. The system message will be truncated if it exceeds the token limits listed in the [token estimation](#token-usage-estimation-for-azure-openai-on-your-data) section.
 
 For example, if you're creating a chatbot where the data consists of transcriptions of quarterly financial earnings calls, you might use the following system message:
 
@@ -358,6 +358,7 @@ Set a limit on the number of tokens per model response. The upper limit for Azur
 This option encourages the model to respond using your data only, and is selected by default. If you unselect this option, the model might more readily apply its internal knowledge to respond. Determine the correct selection based on your use case and scenario. 
 
 
+
 ### Interacting with the model
 
 Use the following practices for best results when chatting with the model. 
@@ -387,7 +388,6 @@ Avoid asking long questions and break them down into multiple questions if possi
 
 *   *"**You are an AI assistant designed to help users extract information from retrieved Japanese documents. Please scrutinize the Japanese documents carefully before formulating a response. The user's query will be in Japanese, and you must response also in Japanese."*
 
-
 * If you have documents in multiple languages, we recommend building a new index for each language and connecting them separately to Azure OpenAI.  
 
 ### Deploying the model
@@ -466,6 +466,8 @@ After you upload your data through Azure OpenAI studio, you can make a call agai
 
 
 
+
+
 |Parameter  |Recommendation  |
 |---------|---------|
 |`fieldsMapping`    | Explicitly set the title and content fields of your index. This impacts the search retrieval quality of Azure AI Search, which impacts the overall response and citation quality.         |
@@ -537,6 +539,7 @@ When you chat with a model, providing a history of the chat will help the model
 ## Token usage estimation for Azure OpenAI on your data
 
 
+
 | Model                   | Total tokens available | Max tokens for system message | Max tokens for model response |
 |-------------------------|------------------------|------------------------------------|------------------------------------|
 | ChatGPT Turbo (0301) 8k | 8000                   | 400                                | 1500                               |
@@ -547,6 +550,7 @@ When you chat with a model, providing a history of the chat will help the model
 The table above shows the total number of tokens available for each model type. It also determines the maximum number of tokens that can be used for the [system message](#system-message) and the model response. Additionally, the following also consume tokens:
 
 
+
 * The meta prompt (MP): if you limit responses from the model to the grounding data content (`inScope=True` in the API), the maximum number of tokens is 4036 tokens. Otherwise (for example if `inScope=False`) the maximum is 3444 tokens. This number is variable depending on the token length of the user question and conversation history. This estimate includes the base prompt as well as the query rewriting prompts for retrieval.
 * User question and history: Variable but capped at 2000 tokens.
 * Retrieved documents (chunks): The number of tokens used by the retrieved document chunks depends on multiple factors. The upper bound for this is the number of retrieved document chunks multiplied by the chunk size. It will, however, be truncated based on the tokens available tokens for the specific model being used after counting the rest of fields.