Skip to content

Commit fc4ea8f

Browse files
initial
1 parent a92cd87 commit fc4ea8f

1 file changed

Lines changed: 328 additions & 0 deletions

File tree

Lines changed: 328 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,328 @@
1+
---
2+
title: Language model best practices
3+
description:
4+
ms.topic: overview
5+
ms.date: 03/26/2026
6+
---
7+
8+
# Language model best practices
9+
10+
This topic provides developer guidance and describes various best practices for the [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel) APIs supported by [Phi Silica](phi-silica.md). It covers both the API functionality and the develpoer requirements for incorporating the supported features into a Windows app.
11+
12+
## Handling non-deterministic output
13+
14+
Most code behaves predictably — the same input always produces the same output. The [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel) APIs don't work that way, as the exact same prompt can yield a different response each time it's submitted due to a randomizing factor built into the APIs.
15+
16+
The Phi Silica model is sensitive to any randomness, with small changes to the input and options producing large changes in the output. For example, the introduction of a single space or typo in a prompt might turn a 100 token answer into a 1000 token answer.
17+
18+
### Why outputs vary
19+
20+
The default sampling parameters introduce randomness into token selection:
21+
22+
| Parameter | Default | Effect on variability |
23+
| --- | --- | --- |
24+
| Temperature | 0.9 | Higher values increase randomness; lower values produce more focused output. |
25+
| TopP | 0.9 | Controls cumulative probability threshold for token candidates. |
26+
| TopK | 40 | Limits how many tokens are considered at each step; lower values reduce variability. |
27+
28+
### Guidance
29+
30+
- **Do not write logic that depends on exact output matching.** The same prompt can produce different text on every call.
31+
- Lowering `Temperature` and `TopK` reduces variability but does not guarantee determinism. There is no exposed seed parameter.
32+
- Setting `Temperature = 0` is not guaranteed to produce identical outputs across calls.
33+
34+
### Reducing variability
35+
36+
```c#
37+
using Microsoft.Windows.AI.Text;
38+
39+
async Task<string> GenerateWithLowVariability(LanguageModel languageModel, string prompt)
40+
{
41+
var options = new LanguageModelOptions();
42+
options.Temperature = 0.1f;
43+
options.TopK = 1;
44+
45+
var result = await languageModel.GenerateResponseAsync(prompt, options);
46+
return result.Text;
47+
}
48+
```
49+
50+
### Anti-pattern: fragile string comparison
51+
52+
```c#
53+
// DO NOT do this - output is non-deterministic
54+
var result = await languageModel.GenerateResponseAsync("Is 2 > 1? Answer yes or no.");
55+
if (result.Text == "Yes") // Fragile: response may be "yes", "Yes.", "Yes, 2 is greater", etc.
56+
{
57+
// ...
58+
}
59+
```
60+
61+
Instead, parse or classify the response in a way that tolerates variation (e.g., check whether
62+
the response contains "yes" case-insensitively, or use the model for structured extraction).
63+
64+
### Semantic comparison with embeddings
65+
66+
Rather than comparing response text directly, use `GenerateEmbeddingVectors` and cosine
67+
similarity to determine whether two outputs are semantically equivalent. This approach is
68+
resilient to differences in wording, punctuation, and formatting.
69+
70+
```c#
71+
using Microsoft.Windows.AI.Text;
72+
73+
double CosineSimilarity(float[] a, float[] b)
74+
{
75+
double dot = 0, magA = 0, magB = 0;
76+
for (int i = 0; i < a.Length; i++)
77+
{
78+
dot += a[i] * b[i];
79+
magA += a[i] * a[i];
80+
magB += b[i] * b[i];
81+
}
82+
return dot / (Math.Sqrt(magA) * Math.Sqrt(magB));
83+
}
84+
85+
async Task<bool> AreResponsesSemanticallyEqual(
86+
LanguageModel languageModel, string prompt, double threshold = 0.9)
87+
{
88+
var result1 = await languageModel.GenerateResponseAsync(prompt);
89+
var result2 = await languageModel.GenerateResponseAsync(prompt);
90+
91+
var embedding1 = languageModel.GenerateEmbeddingVectors(result1.Text);
92+
var embedding2 = languageModel.GenerateEmbeddingVectors(result2.Text);
93+
94+
// Extract the float arrays from the first embedding vector in each result
95+
float[] values1 = new float[embedding1.EmbeddingVectors[0].Size];
96+
embedding1.EmbeddingVectors[0].GetValues(ref values1);
97+
98+
float[] values2 = new float[embedding2.EmbeddingVectors[0].Size];
99+
embedding2.EmbeddingVectors[0].GetValues(ref values2);
100+
101+
double similarity = CosineSimilarity(values1, values2);
102+
return similarity >= threshold;
103+
}
104+
```
105+
106+
## Using Context for Multi-Turn Conversations
107+
108+
Each call to `GenerateResponseAsync` without a `LanguageModelContext` is stateless. The model has
109+
no memory of prior prompts or responses. To build a multi-turn conversation, you must create and
110+
pass a `LanguageModelContext`.
111+
112+
### How context works
113+
114+
- `CreateContext()` or `CreateContext(systemPrompt)` returns a `LanguageModelContext` that
115+
accumulates conversation history.
116+
- When you pass a context to `GenerateResponseAsync`, the call modifies the context in-place,
117+
appending both the prompt and the response to the conversation history.
118+
- The system prompt, set at context creation time, guides model behavior for the entire
119+
conversation.
120+
121+
### Guidance
122+
123+
- Create a context with a system prompt to set the model's role and behavioral boundaries.
124+
- Pass the same context to every `GenerateResponseAsync` call within a conversation.
125+
- `LanguageModelContext` implements `IClosable` — dispose it when the conversation ends.
126+
- If content moderation blocks a prompt or response, the context state is unspecified. Consider
127+
creating a new context after a moderation block.
128+
129+
### Proper multi-turn conversation
130+
131+
```c#
132+
using Microsoft.Windows.AI.Text;
133+
134+
async Task RunConversation(LanguageModel languageModel)
135+
{
136+
using var context = languageModel.CreateContext(
137+
"You are a helpful cooking assistant. Answer questions about recipes and techniques.");
138+
139+
var options = new LanguageModelOptions();
140+
141+
var result1 = await languageModel.GenerateResponseAsync(
142+
context, "How do I make a roux?", options);
143+
Console.WriteLine(result1.Text);
144+
145+
// Context now contains the first exchange — the model remembers it
146+
var result2 = await languageModel.GenerateResponseAsync(
147+
context, "What ratio of butter to flour should I use?", options);
148+
Console.WriteLine(result2.Text);
149+
150+
// The model can reference both prior turns
151+
var result3 = await languageModel.GenerateResponseAsync(
152+
context, "Can I use olive oil instead?", options);
153+
Console.WriteLine(result3.Text);
154+
}
155+
```
156+
157+
### Anti-pattern: stateless calls losing conversational coherence
158+
159+
```c#
160+
// DO NOT do this for multi-turn conversations
161+
var result1 = await languageModel.GenerateResponseAsync("How do I make a roux?");
162+
// No context passed — next call has no memory of the first
163+
var result2 = await languageModel.GenerateResponseAsync("What ratio should I use?");
164+
// The model does not know "ratio" refers to the roux ingredients
165+
```
166+
167+
## Managing Context Length
168+
169+
The model has a finite context window. The API does not automatically truncate or summarize
170+
conversation history. Context length management is the developer's responsibility.
171+
172+
Context is consumed by the system prompt, accumulated conversation history (all prior prompts
173+
and responses), and the current prompt. As conversations grow, the remaining space for new
174+
prompts shrinks.
175+
176+
### Key API: `GetUsablePromptLength`
177+
178+
`GetUsablePromptLength(context, prompt)` returns a character index into the prompt string
179+
indicating where the context window ran out of space. If the return value equals the prompt
180+
length, the entire prompt fits.
181+
182+
### Strategies
183+
184+
1. **Check before sending** — call `GetUsablePromptLength` before each `GenerateResponseAsync`.
185+
2. **Trim the prompt** — if the return value is less than the prompt length, truncate or
186+
rephrase the prompt to fit.
187+
3. **Reset context** — when history fills up, create a new context. Optionally carry forward a
188+
summary of the conversation as the system prompt for the new context.
189+
4. **Handle `PromptLargerThanContext`** — always check `result.Status` and handle this status
190+
gracefully.
191+
192+
### Pre-send length check with trimming
193+
194+
```c#
195+
using Microsoft.Windows.AI.Text;
196+
197+
async Task<LanguageModelResponseResult> SendWithLengthCheck(
198+
LanguageModel languageModel,
199+
LanguageModelContext context,
200+
string prompt,
201+
LanguageModelOptions options)
202+
{
203+
ulong usableLength = languageModel.GetUsablePromptLength(context, prompt);
204+
205+
if (usableLength < (ulong)prompt.Length)
206+
{
207+
// Trim prompt to fit the remaining context window
208+
prompt = prompt.Substring(0, (int)usableLength);
209+
}
210+
211+
return await languageModel.GenerateResponseAsync(context, prompt, options);
212+
}
213+
```
214+
215+
### Context reset when the window fills up
216+
217+
```c#
218+
using Microsoft.Windows.AI.Text;
219+
220+
async Task<LanguageModelResponseResult> SendWithContextReset(
221+
LanguageModel languageModel,
222+
ref LanguageModelContext context,
223+
string prompt,
224+
LanguageModelOptions options,
225+
string baseSystemPrompt)
226+
{
227+
ulong usableLength = languageModel.GetUsablePromptLength(context, prompt);
228+
229+
if (usableLength == 0)
230+
{
231+
// Context is full — summarize and start fresh
232+
var summaryResult = await languageModel.GenerateResponseAsync(
233+
"Summarize our conversation so far in 2-3 sentences.");
234+
235+
context.Dispose();
236+
context = languageModel.CreateContext(
237+
baseSystemPrompt + "\n\nPrior conversation summary: " + summaryResult.Text);
238+
}
239+
240+
return await languageModel.GenerateResponseAsync(context, prompt, options);
241+
}
242+
```
243+
244+
## Handling Response Status
245+
246+
Always check `result.Status` before using `result.Text`. A non-`Complete` status means the
247+
text may be empty or incomplete.
248+
249+
| Status | Meaning | Recommended handling |
250+
|-|-|-|
251+
| `Complete` | Full response generated successfully | Use `result.Text` |
252+
| `InProgress` | Generation is still running | Wait for completion via the async operation |
253+
| `BlockedByPolicy` | Generative AI blocked by system policy | Inform the user that the feature is unavailable |
254+
| `PromptLargerThanContext` | Prompt exceeds the context window | Trim the prompt or reset the context |
255+
| `PromptBlockedByContentModeration` | Input blocked by content moderation | Inform the user their input was filtered |
256+
| `ResponseBlockedByContentModeration` | Output blocked by content moderation | Inform the user the response was filtered; consider rephrasing |
257+
| `Error` | An error occurred | Check `result.ExtendedError` for details |
258+
259+
```c#
260+
using Microsoft.Windows.AI.Text;
261+
262+
void HandleResponse(LanguageModelResponseResult result)
263+
{
264+
switch (result.Status)
265+
{
266+
case LanguageModelResponseStatus.Complete:
267+
Console.WriteLine(result.Text);
268+
break;
269+
270+
case LanguageModelResponseStatus.BlockedByPolicy:
271+
Console.WriteLine("This feature is not available on this device.");
272+
break;
273+
274+
case LanguageModelResponseStatus.PromptLargerThanContext:
275+
Console.WriteLine("Prompt is too long. Please shorten your input.");
276+
break;
277+
278+
case LanguageModelResponseStatus.PromptBlockedByContentModeration:
279+
Console.WriteLine("Your input was blocked by content filtering.");
280+
break;
281+
282+
case LanguageModelResponseStatus.ResponseBlockedByContentModeration:
283+
Console.WriteLine("The response was blocked by content filtering.");
284+
break;
285+
286+
case LanguageModelResponseStatus.Error:
287+
Console.WriteLine($"Error: {result.ExtendedError}");
288+
break;
289+
}
290+
}
291+
```
292+
293+
## Resource Lifecycle Management
294+
295+
Both `LanguageModel` and `LanguageModelContext` implement `IClosable`. Failing to dispose them
296+
can leak native resources.
297+
298+
### Guidance
299+
300+
- Use `using` statements or call `Dispose()` explicitly.
301+
- Create one `LanguageModel` instance and reuse it across calls. Do not create a new instance
302+
for each request.
303+
- Dispose `LanguageModelContext` when its conversation ends, not after every call.
304+
305+
```c#
306+
using Microsoft.Windows.AI.Text;
307+
308+
async Task Example()
309+
{
310+
// One LanguageModel instance, reused across conversations
311+
using LanguageModel languageModel = await LanguageModel.CreateAsync();
312+
313+
// First conversation
314+
using (var context1 = languageModel.CreateContext("You are a math tutor."))
315+
{
316+
var options = new LanguageModelOptions();
317+
await languageModel.GenerateResponseAsync(context1, "What is 12 * 15?", options);
318+
await languageModel.GenerateResponseAsync(context1, "Now divide that by 3.", options);
319+
} // context1 disposed here
320+
321+
// Second conversation — reuses the same LanguageModel
322+
using (var context2 = languageModel.CreateContext("You are a writing assistant."))
323+
{
324+
var options = new LanguageModelOptions();
325+
await languageModel.GenerateResponseAsync(context2, "Help me write an intro paragraph.", options);
326+
} // context2 disposed here
327+
} // languageModel disposed here
328+
```

0 commit comments

Comments
 (0)