|
| 1 | +--- |
| 2 | +title: Language model best practices |
| 3 | +description: |
| 4 | +ms.topic: overview |
| 5 | +ms.date: 03/26/2026 |
| 6 | +--- |
| 7 | + |
| 8 | +# Language model best practices |
| 9 | + |
| 10 | +This topic provides developer guidance and describes various best practices for the [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel) APIs supported by [Phi Silica](phi-silica.md). It covers both the API functionality and the develpoer requirements for incorporating the supported features into a Windows app. |
| 11 | + |
| 12 | +## Handling non-deterministic output |
| 13 | + |
| 14 | +Most code behaves predictably — the same input always produces the same output. The [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel) APIs don't work that way, as the exact same prompt can yield a different response each time it's submitted due to a randomizing factor built into the APIs. |
| 15 | + |
| 16 | +The Phi Silica model is sensitive to any randomness, with small changes to the input and options producing large changes in the output. For example, the introduction of a single space or typo in a prompt might turn a 100 token answer into a 1000 token answer. |
| 17 | + |
| 18 | +### Why outputs vary |
| 19 | + |
| 20 | +The default sampling parameters introduce randomness into token selection: |
| 21 | + |
| 22 | +| Parameter | Default | Effect on variability | |
| 23 | +| --- | --- | --- | |
| 24 | +| Temperature | 0.9 | Higher values increase randomness; lower values produce more focused output. | |
| 25 | +| TopP | 0.9 | Controls cumulative probability threshold for token candidates. | |
| 26 | +| TopK | 40 | Limits how many tokens are considered at each step; lower values reduce variability. | |
| 27 | + |
| 28 | +### Guidance |
| 29 | + |
| 30 | +- **Do not write logic that depends on exact output matching.** The same prompt can produce different text on every call. |
| 31 | +- Lowering `Temperature` and `TopK` reduces variability but does not guarantee determinism. There is no exposed seed parameter. |
| 32 | +- Setting `Temperature = 0` is not guaranteed to produce identical outputs across calls. |
| 33 | + |
| 34 | +### Reducing variability |
| 35 | + |
| 36 | +```c# |
| 37 | +using Microsoft.Windows.AI.Text; |
| 38 | + |
| 39 | +async Task<string> GenerateWithLowVariability(LanguageModel languageModel, string prompt) |
| 40 | +{ |
| 41 | + var options = new LanguageModelOptions(); |
| 42 | + options.Temperature = 0.1f; |
| 43 | + options.TopK = 1; |
| 44 | + |
| 45 | + var result = await languageModel.GenerateResponseAsync(prompt, options); |
| 46 | + return result.Text; |
| 47 | +} |
| 48 | +``` |
| 49 | + |
| 50 | +### Anti-pattern: fragile string comparison |
| 51 | + |
| 52 | +```c# |
| 53 | +// DO NOT do this - output is non-deterministic |
| 54 | +var result = await languageModel.GenerateResponseAsync("Is 2 > 1? Answer yes or no."); |
| 55 | +if (result.Text == "Yes") // Fragile: response may be "yes", "Yes.", "Yes, 2 is greater", etc. |
| 56 | +{ |
| 57 | + // ... |
| 58 | +} |
| 59 | +``` |
| 60 | + |
| 61 | +Instead, parse or classify the response in a way that tolerates variation (e.g., check whether |
| 62 | +the response contains "yes" case-insensitively, or use the model for structured extraction). |
| 63 | + |
| 64 | +### Semantic comparison with embeddings |
| 65 | + |
| 66 | +Rather than comparing response text directly, use `GenerateEmbeddingVectors` and cosine |
| 67 | +similarity to determine whether two outputs are semantically equivalent. This approach is |
| 68 | +resilient to differences in wording, punctuation, and formatting. |
| 69 | + |
| 70 | +```c# |
| 71 | +using Microsoft.Windows.AI.Text; |
| 72 | + |
| 73 | +double CosineSimilarity(float[] a, float[] b) |
| 74 | +{ |
| 75 | + double dot = 0, magA = 0, magB = 0; |
| 76 | + for (int i = 0; i < a.Length; i++) |
| 77 | + { |
| 78 | + dot += a[i] * b[i]; |
| 79 | + magA += a[i] * a[i]; |
| 80 | + magB += b[i] * b[i]; |
| 81 | + } |
| 82 | + return dot / (Math.Sqrt(magA) * Math.Sqrt(magB)); |
| 83 | +} |
| 84 | + |
| 85 | +async Task<bool> AreResponsesSemanticallyEqual( |
| 86 | + LanguageModel languageModel, string prompt, double threshold = 0.9) |
| 87 | +{ |
| 88 | + var result1 = await languageModel.GenerateResponseAsync(prompt); |
| 89 | + var result2 = await languageModel.GenerateResponseAsync(prompt); |
| 90 | + |
| 91 | + var embedding1 = languageModel.GenerateEmbeddingVectors(result1.Text); |
| 92 | + var embedding2 = languageModel.GenerateEmbeddingVectors(result2.Text); |
| 93 | + |
| 94 | + // Extract the float arrays from the first embedding vector in each result |
| 95 | + float[] values1 = new float[embedding1.EmbeddingVectors[0].Size]; |
| 96 | + embedding1.EmbeddingVectors[0].GetValues(ref values1); |
| 97 | + |
| 98 | + float[] values2 = new float[embedding2.EmbeddingVectors[0].Size]; |
| 99 | + embedding2.EmbeddingVectors[0].GetValues(ref values2); |
| 100 | + |
| 101 | + double similarity = CosineSimilarity(values1, values2); |
| 102 | + return similarity >= threshold; |
| 103 | +} |
| 104 | +``` |
| 105 | + |
| 106 | +## Using Context for Multi-Turn Conversations |
| 107 | + |
| 108 | +Each call to `GenerateResponseAsync` without a `LanguageModelContext` is stateless. The model has |
| 109 | +no memory of prior prompts or responses. To build a multi-turn conversation, you must create and |
| 110 | +pass a `LanguageModelContext`. |
| 111 | + |
| 112 | +### How context works |
| 113 | + |
| 114 | +- `CreateContext()` or `CreateContext(systemPrompt)` returns a `LanguageModelContext` that |
| 115 | + accumulates conversation history. |
| 116 | +- When you pass a context to `GenerateResponseAsync`, the call modifies the context in-place, |
| 117 | + appending both the prompt and the response to the conversation history. |
| 118 | +- The system prompt, set at context creation time, guides model behavior for the entire |
| 119 | + conversation. |
| 120 | + |
| 121 | +### Guidance |
| 122 | + |
| 123 | +- Create a context with a system prompt to set the model's role and behavioral boundaries. |
| 124 | +- Pass the same context to every `GenerateResponseAsync` call within a conversation. |
| 125 | +- `LanguageModelContext` implements `IClosable` — dispose it when the conversation ends. |
| 126 | +- If content moderation blocks a prompt or response, the context state is unspecified. Consider |
| 127 | + creating a new context after a moderation block. |
| 128 | + |
| 129 | +### Proper multi-turn conversation |
| 130 | + |
| 131 | +```c# |
| 132 | +using Microsoft.Windows.AI.Text; |
| 133 | + |
| 134 | +async Task RunConversation(LanguageModel languageModel) |
| 135 | +{ |
| 136 | + using var context = languageModel.CreateContext( |
| 137 | + "You are a helpful cooking assistant. Answer questions about recipes and techniques."); |
| 138 | + |
| 139 | + var options = new LanguageModelOptions(); |
| 140 | + |
| 141 | + var result1 = await languageModel.GenerateResponseAsync( |
| 142 | + context, "How do I make a roux?", options); |
| 143 | + Console.WriteLine(result1.Text); |
| 144 | + |
| 145 | + // Context now contains the first exchange — the model remembers it |
| 146 | + var result2 = await languageModel.GenerateResponseAsync( |
| 147 | + context, "What ratio of butter to flour should I use?", options); |
| 148 | + Console.WriteLine(result2.Text); |
| 149 | + |
| 150 | + // The model can reference both prior turns |
| 151 | + var result3 = await languageModel.GenerateResponseAsync( |
| 152 | + context, "Can I use olive oil instead?", options); |
| 153 | + Console.WriteLine(result3.Text); |
| 154 | +} |
| 155 | +``` |
| 156 | + |
| 157 | +### Anti-pattern: stateless calls losing conversational coherence |
| 158 | + |
| 159 | +```c# |
| 160 | +// DO NOT do this for multi-turn conversations |
| 161 | +var result1 = await languageModel.GenerateResponseAsync("How do I make a roux?"); |
| 162 | +// No context passed — next call has no memory of the first |
| 163 | +var result2 = await languageModel.GenerateResponseAsync("What ratio should I use?"); |
| 164 | +// The model does not know "ratio" refers to the roux ingredients |
| 165 | +``` |
| 166 | + |
| 167 | +## Managing Context Length |
| 168 | + |
| 169 | +The model has a finite context window. The API does not automatically truncate or summarize |
| 170 | +conversation history. Context length management is the developer's responsibility. |
| 171 | + |
| 172 | +Context is consumed by the system prompt, accumulated conversation history (all prior prompts |
| 173 | +and responses), and the current prompt. As conversations grow, the remaining space for new |
| 174 | +prompts shrinks. |
| 175 | + |
| 176 | +### Key API: `GetUsablePromptLength` |
| 177 | + |
| 178 | +`GetUsablePromptLength(context, prompt)` returns a character index into the prompt string |
| 179 | +indicating where the context window ran out of space. If the return value equals the prompt |
| 180 | +length, the entire prompt fits. |
| 181 | + |
| 182 | +### Strategies |
| 183 | + |
| 184 | +1. **Check before sending** — call `GetUsablePromptLength` before each `GenerateResponseAsync`. |
| 185 | +2. **Trim the prompt** — if the return value is less than the prompt length, truncate or |
| 186 | + rephrase the prompt to fit. |
| 187 | +3. **Reset context** — when history fills up, create a new context. Optionally carry forward a |
| 188 | + summary of the conversation as the system prompt for the new context. |
| 189 | +4. **Handle `PromptLargerThanContext`** — always check `result.Status` and handle this status |
| 190 | + gracefully. |
| 191 | + |
| 192 | +### Pre-send length check with trimming |
| 193 | + |
| 194 | +```c# |
| 195 | +using Microsoft.Windows.AI.Text; |
| 196 | + |
| 197 | +async Task<LanguageModelResponseResult> SendWithLengthCheck( |
| 198 | + LanguageModel languageModel, |
| 199 | + LanguageModelContext context, |
| 200 | + string prompt, |
| 201 | + LanguageModelOptions options) |
| 202 | +{ |
| 203 | + ulong usableLength = languageModel.GetUsablePromptLength(context, prompt); |
| 204 | + |
| 205 | + if (usableLength < (ulong)prompt.Length) |
| 206 | + { |
| 207 | + // Trim prompt to fit the remaining context window |
| 208 | + prompt = prompt.Substring(0, (int)usableLength); |
| 209 | + } |
| 210 | + |
| 211 | + return await languageModel.GenerateResponseAsync(context, prompt, options); |
| 212 | +} |
| 213 | +``` |
| 214 | + |
| 215 | +### Context reset when the window fills up |
| 216 | + |
| 217 | +```c# |
| 218 | +using Microsoft.Windows.AI.Text; |
| 219 | + |
| 220 | +async Task<LanguageModelResponseResult> SendWithContextReset( |
| 221 | + LanguageModel languageModel, |
| 222 | + ref LanguageModelContext context, |
| 223 | + string prompt, |
| 224 | + LanguageModelOptions options, |
| 225 | + string baseSystemPrompt) |
| 226 | +{ |
| 227 | + ulong usableLength = languageModel.GetUsablePromptLength(context, prompt); |
| 228 | + |
| 229 | + if (usableLength == 0) |
| 230 | + { |
| 231 | + // Context is full — summarize and start fresh |
| 232 | + var summaryResult = await languageModel.GenerateResponseAsync( |
| 233 | + "Summarize our conversation so far in 2-3 sentences."); |
| 234 | + |
| 235 | + context.Dispose(); |
| 236 | + context = languageModel.CreateContext( |
| 237 | + baseSystemPrompt + "\n\nPrior conversation summary: " + summaryResult.Text); |
| 238 | + } |
| 239 | + |
| 240 | + return await languageModel.GenerateResponseAsync(context, prompt, options); |
| 241 | +} |
| 242 | +``` |
| 243 | + |
| 244 | +## Handling Response Status |
| 245 | + |
| 246 | +Always check `result.Status` before using `result.Text`. A non-`Complete` status means the |
| 247 | +text may be empty or incomplete. |
| 248 | + |
| 249 | +| Status | Meaning | Recommended handling | |
| 250 | +|-|-|-| |
| 251 | +| `Complete` | Full response generated successfully | Use `result.Text` | |
| 252 | +| `InProgress` | Generation is still running | Wait for completion via the async operation | |
| 253 | +| `BlockedByPolicy` | Generative AI blocked by system policy | Inform the user that the feature is unavailable | |
| 254 | +| `PromptLargerThanContext` | Prompt exceeds the context window | Trim the prompt or reset the context | |
| 255 | +| `PromptBlockedByContentModeration` | Input blocked by content moderation | Inform the user their input was filtered | |
| 256 | +| `ResponseBlockedByContentModeration` | Output blocked by content moderation | Inform the user the response was filtered; consider rephrasing | |
| 257 | +| `Error` | An error occurred | Check `result.ExtendedError` for details | |
| 258 | + |
| 259 | +```c# |
| 260 | +using Microsoft.Windows.AI.Text; |
| 261 | + |
| 262 | +void HandleResponse(LanguageModelResponseResult result) |
| 263 | +{ |
| 264 | + switch (result.Status) |
| 265 | + { |
| 266 | + case LanguageModelResponseStatus.Complete: |
| 267 | + Console.WriteLine(result.Text); |
| 268 | + break; |
| 269 | + |
| 270 | + case LanguageModelResponseStatus.BlockedByPolicy: |
| 271 | + Console.WriteLine("This feature is not available on this device."); |
| 272 | + break; |
| 273 | + |
| 274 | + case LanguageModelResponseStatus.PromptLargerThanContext: |
| 275 | + Console.WriteLine("Prompt is too long. Please shorten your input."); |
| 276 | + break; |
| 277 | + |
| 278 | + case LanguageModelResponseStatus.PromptBlockedByContentModeration: |
| 279 | + Console.WriteLine("Your input was blocked by content filtering."); |
| 280 | + break; |
| 281 | + |
| 282 | + case LanguageModelResponseStatus.ResponseBlockedByContentModeration: |
| 283 | + Console.WriteLine("The response was blocked by content filtering."); |
| 284 | + break; |
| 285 | + |
| 286 | + case LanguageModelResponseStatus.Error: |
| 287 | + Console.WriteLine($"Error: {result.ExtendedError}"); |
| 288 | + break; |
| 289 | + } |
| 290 | +} |
| 291 | +``` |
| 292 | + |
| 293 | +## Resource Lifecycle Management |
| 294 | + |
| 295 | +Both `LanguageModel` and `LanguageModelContext` implement `IClosable`. Failing to dispose them |
| 296 | +can leak native resources. |
| 297 | + |
| 298 | +### Guidance |
| 299 | + |
| 300 | +- Use `using` statements or call `Dispose()` explicitly. |
| 301 | +- Create one `LanguageModel` instance and reuse it across calls. Do not create a new instance |
| 302 | + for each request. |
| 303 | +- Dispose `LanguageModelContext` when its conversation ends, not after every call. |
| 304 | + |
| 305 | +```c# |
| 306 | +using Microsoft.Windows.AI.Text; |
| 307 | + |
| 308 | +async Task Example() |
| 309 | +{ |
| 310 | + // One LanguageModel instance, reused across conversations |
| 311 | + using LanguageModel languageModel = await LanguageModel.CreateAsync(); |
| 312 | + |
| 313 | + // First conversation |
| 314 | + using (var context1 = languageModel.CreateContext("You are a math tutor.")) |
| 315 | + { |
| 316 | + var options = new LanguageModelOptions(); |
| 317 | + await languageModel.GenerateResponseAsync(context1, "What is 12 * 15?", options); |
| 318 | + await languageModel.GenerateResponseAsync(context1, "Now divide that by 3.", options); |
| 319 | + } // context1 disposed here |
| 320 | +
|
| 321 | + // Second conversation — reuses the same LanguageModel |
| 322 | + using (var context2 = languageModel.CreateContext("You are a writing assistant.")) |
| 323 | + { |
| 324 | + var options = new LanguageModelOptions(); |
| 325 | + await languageModel.GenerateResponseAsync(context2, "Help me write an intro paragraph.", options); |
| 326 | + } // context2 disposed here |
| 327 | +} // languageModel disposed here |
| 328 | +``` |
0 commit comments