Skip to content

Commit f37d0c8

Browse files
final draft
1 parent 4003004 commit f37d0c8

1 file changed

Lines changed: 65 additions & 51 deletions

File tree

docs/apis/language-model-best-practices.md

Lines changed: 65 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
2-
title: Language model best practices
3-
description:
2+
title: Best practices for the Phi Silica LanguageModel API
3+
description: Learn best practices for the Phi Silica LanguageModel API in the Windows App SDK, including handling non-deterministic output, managing context for multi-turn conversations, and disposing resources.
44
ms.topic: overview
55
ms.date: 03/26/2026
66
---
77

8-
# Language model best practices
8+
# Best practices for the Phi Silica LanguageModel API
99

10-
This topic provides developer guidance and describes various best practices for the [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel) APIs supported by [Phi Silica](phi-silica.md). It covers both the API functionality and the develpoer requirements for incorporating the supported features into a Windows app.
10+
The [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel) API gives your Windows app access to the on-device [Phi Silica](phi-silica.md) model, but working with a language model introduces behaviors that differ from traditional deterministic code. This topic covers the key areas you need to handle with each section providing guidance and code samples for the [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel) API.
1111

1212
## Handling non-deterministic output
1313

@@ -114,29 +114,26 @@ async Task<bool> AreResponsesSemanticallyEqual(
114114

115115
## Using Context for Multi-Turn Conversations
116116

117-
Each call to `GenerateResponseAsync` without a `LanguageModelContext` is stateless. The model has
118-
no memory of prior prompts or responses. To build a multi-turn conversation, you must create and
119-
pass a `LanguageModelContext`.
117+
Each call to [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync) without a [LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext) is stateless. The model has no memory of prior prompts or responses. To build a multi-turn conversation, you must create and pass a [LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext).
120118

121119
### How context works
122120

123-
- `CreateContext()` or `CreateContext(systemPrompt)` returns a `LanguageModelContext` that
124-
accumulates conversation history.
125-
- When you pass a context to `GenerateResponseAsync`, the call modifies the context in-place,
126-
appending both the prompt and the response to the conversation history.
127-
- The system prompt, set at context creation time, guides model behavior for the entire
128-
conversation.
121+
- [CreateContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.createcontext) returns a [LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext) that accumulates conversation history.
122+
- When you pass a context to [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync), the call modifies the context in-place, appending both the prompt and the response to the conversation history.
123+
- The system prompt, set at context creation time, guides model behavior for the entire conversation.
129124

130125
### Guidance
131126

132127
- Create a context with a system prompt to set the model's role and behavioral boundaries.
133-
- Pass the same context to every `GenerateResponseAsync` call within a conversation.
134-
- `LanguageModelContext` implements `IClosable` — dispose it when the conversation ends.
128+
- Pass the same context to every [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync) call within a conversation.
129+
- [LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext) implements [IClosable](/uwp/api/windows.foundation.iclosable) — dispose it when the conversation ends.
135130
- If content moderation blocks a prompt or response, the context state is unspecified. Consider
136131
creating a new context after a moderation block.
137132

138133
### Proper multi-turn conversation
139134

135+
The following snippet shows a three-turn conversation where each call passes the same `context` object. Because the context accumulates history, the model can resolve references like "that" and "instead" back to earlier turns (the second question builds on the first answer and the third builds on both). Each response is checked for `Complete` status before use.
136+
140137
```c#
141138
using Microsoft.Windows.AI.Text;
142139

@@ -149,21 +146,29 @@ async Task RunConversation(LanguageModel languageModel)
149146

150147
var result1 = await languageModel.GenerateResponseAsync(
151148
context, "How do I make a roux?", options);
149+
if (result1.Status != LanguageModelResponseStatus.Complete)
150+
return;
152151
Console.WriteLine(result1.Text);
153152

154153
// Context now contains the first exchange — the model remembers it
155154
var result2 = await languageModel.GenerateResponseAsync(
156155
context, "What ratio of butter to flour should I use?", options);
156+
if (result2.Status != LanguageModelResponseStatus.Complete)
157+
return;
157158
Console.WriteLine(result2.Text);
158159

159160
// The model can reference both prior turns
160161
var result3 = await languageModel.GenerateResponseAsync(
161162
context, "Can I use olive oil instead?", options);
163+
if (result3.Status != LanguageModelResponseStatus.Complete)
164+
return;
162165
Console.WriteLine(result3.Text);
163166
}
164167
```
165168

166-
### Anti-pattern: stateless calls losing conversational coherence
169+
### Stateless calls lose conversational coherence
170+
171+
The following snippet shows what happens when you omit the context object. Each call to [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync) starts from scratch, so the model has no way to connect "ratio" in the second prompt back to the roux discussed in the first. The result is an incoherent conversation.
167172

168173
```c#
169174
// DO NOT do this for multi-turn conversations
@@ -175,30 +180,24 @@ var result2 = await languageModel.GenerateResponseAsync("What ratio should I use
175180

176181
## Managing Context Length
177182

178-
The model has a finite context window. The API does not automatically truncate or summarize
179-
conversation history. Context length management is the developer's responsibility.
183+
The model has a finite context window. The API does not automatically truncate or summarize conversation history. Context length management is the developer's responsibility.
180184

181-
Context is consumed by the system prompt, accumulated conversation history (all prior prompts
182-
and responses), and the current prompt. As conversations grow, the remaining space for new
183-
prompts shrinks.
185+
Every [LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext) has a finite context window. As the API does not automatically truncate or summarize conversation history, managing context length is up to you.
184186

185-
### Key API: `GetUsablePromptLength`
187+
The context window is shared by the system prompt, all accumulated conversation history (prior prompts and responses), and the current prompt. As conversations grow, the remaining space for new prompts shrinks.
186188

187-
`GetUsablePromptLength(context, prompt)` returns a character index into the prompt string
188-
indicating where the context window ran out of space. If the return value equals the prompt
189-
length, the entire prompt fits.
189+
Before sending a prompt, call [GetUsablePromptLength(context, prompt)](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.getusablepromptlength) to find out how much of your prompt actually fits. The method returns a character index into the prompt string that if the index equals the prompt's length, the entire prompt fits within the remaining context window. If it's less, only the characters up to that index can be accepted, and you'll need to trim, rephrase, or reset the context before calling [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync).
190190

191-
### Strategies
191+
### Strategies for staying within the context window
192192

193-
1. **Check before sending** — call `GetUsablePromptLength` before each `GenerateResponseAsync`.
194-
2. **Trim the prompt** — if the return value is less than the prompt length, truncate or
195-
rephrase the prompt to fit.
196-
3. **Reset context** — when history fills up, create a new context. Optionally carry forward a
197-
summary of the conversation as the system prompt for the new context.
198-
4. **Handle `PromptLargerThanContext`** — always check `result.Status` and handle this status
199-
gracefully.
193+
1. **Check before sending** — call [GetUsablePromptLength](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.getusablepromptlength) before each [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync) call to confirm the prompt fits.
194+
2. **Trim the prompt** — if the return value is less than the prompt length, truncate or rephrase the prompt to fit the remaining window.
195+
3. **Reset context** — when history fills up, create a new context. Optionally summarize the conversation so far and carry the summary forward as the system prompt for the new context.
196+
4. **Handle `PromptLargerThanContext`** — always check `result.Status`. If the status is `PromptLargerThanContext`, trim the prompt or reset the context as described above.
200197

201-
### Pre-send length check with trimming
198+
### Trim the prompt to fit the context window
199+
200+
The following example calls [GetUsablePromptLength](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.getusablepromptlength) to determine how much of the prompt fits, truncates it if necessary, and then sends it. If `usableLength` is zero (the context is completely full), this method sends an empty prompt; see the next section for how to handle that case by resetting the context.
202201

203202
```c#
204203
using Microsoft.Windows.AI.Text;
@@ -221,7 +220,9 @@ async Task<LanguageModelResponseResult> SendWithLengthCheck(
221220
}
222221
```
223222

224-
### Context reset when the window fills up
223+
### Reset the context when the window is full
224+
225+
When the context window has no remaining space, you can't send another prompt without first freeing room. One approach is to ask the model to summarize the conversation so far, dispose the old context, and create a fresh one seeded with a system prompt that includes the summary. This preserves conversational continuity without carrying the full history forward. Note that the summary itself is generated by the model and is subject to the same non-determinism described earlier.
225226

226227
```c#
227228
using Microsoft.Windows.AI.Text;
@@ -237,26 +238,31 @@ async Task<LanguageModelResponseResult> SendWithContextReset(
237238

238239
if (usableLength == 0)
239240
{
240-
// Context is full — summarize and start fresh
241+
// Context is full — summarize using the existing context, then start fresh
241242
var summaryResult = await languageModel.GenerateResponseAsync(
242-
"Summarize our conversation so far in 2-3 sentences.");
243+
context, "Summarize our conversation so far in 2-3 sentences.", options);
243244

244245
context.Dispose();
245-
context = languageModel.CreateContext(
246-
baseSystemPrompt + "\n\nPrior conversation summary: " + summaryResult.Text);
246+
247+
string newSystemPrompt = baseSystemPrompt;
248+
if (summaryResult.Status == LanguageModelResponseStatus.Complete)
249+
{
250+
newSystemPrompt += "\n\nPrior conversation summary: " + summaryResult.Text;
251+
}
252+
253+
context = languageModel.CreateContext(newSystemPrompt);
247254
}
248255

249256
return await languageModel.GenerateResponseAsync(context, prompt, options);
250257
}
251258
```
252259

253-
## Handling Response Status
260+
## Check response status before using results
254261

255-
Always check `result.Status` before using `result.Text`. A non-`Complete` status means the
256-
text may be empty or incomplete.
262+
Every call to [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync) returns a [LanguageModelResponseResult](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelresponseresult) whose `Status` property tells you whether the response completed successfully. Always check `Status` before reading `Text` as a non-`Complete` status means the text may be empty, incomplete, or absent entirely. The following table lists each possible status value and recommended handling.
257263

258264
| Status | Meaning | Recommended handling |
259-
|-|-|-|
265+
| ---| --- | --- |
260266
| `Complete` | Full response generated successfully | Use `result.Text` |
261267
| `InProgress` | Generation is still running | Wait for completion via the async operation |
262268
| `BlockedByPolicy` | Generative AI blocked by system policy | Inform the user that the feature is unavailable |
@@ -265,6 +271,8 @@ text may be empty or incomplete.
265271
| `ResponseBlockedByContentModeration` | Output blocked by content moderation | Inform the user the response was filtered; consider rephrasing |
266272
| `Error` | An error occurred | Check `result.ExtendedError` for details |
267273

274+
The following snippet demonstrates a helper method that handles every status. The `InProgress` status is omitted because `GenerateResponseAsync` returns only after generation finishes (it applies only when using streaming APIs).
275+
268276
```c#
269277
using Microsoft.Windows.AI.Text;
270278

@@ -295,21 +303,23 @@ void HandleResponse(LanguageModelResponseResult result)
295303
case LanguageModelResponseStatus.Error:
296304
Console.WriteLine($"Error: {result.ExtendedError}");
297305
break;
306+
307+
default:
308+
Console.WriteLine($"Unexpected status: {result.Status}");
309+
break;
298310
}
299311
}
300312
```
301313

302-
## Resource Lifecycle Management
314+
## Dispose LanguageModel and context objects
303315

304-
Both `LanguageModel` and `LanguageModelContext` implement `IClosable`. Failing to dispose them
305-
can leak native resources.
316+
Both [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel) and [LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext) implement [IClosable](/uwp/api/windows.foundation.iclosable) and hold native resources that are not reclaimed by garbage collection alone.
306317

307-
### Guidance
318+
- Use `using` statements (or call `Dispose()` explicitly) to release them promptly.
319+
- Create a single `LanguageModel` instance and reuse it across calls, don't create a new one for each request.
320+
- Dispose each `LanguageModelContext` when its conversation ends, not after every individual call.
308321

309-
- Use `using` statements or call `Dispose()` explicitly.
310-
- Create one `LanguageModel` instance and reuse it across calls. Do not create a new instance
311-
for each request.
312-
- Dispose `LanguageModelContext` when its conversation ends, not after every call.
322+
The following snippet demonstrates both patterns: a `using` declaration for the `LanguageModel` (disposed at method exit) and `using` blocks for each context (disposed when the conversation is done). Status checks are omitted for brevity, see [Check response status before using results](#check-response-status-before-using-results) for that pattern.
313323

314324
```c#
315325
using Microsoft.Windows.AI.Text;
@@ -335,3 +345,7 @@ async Task Example()
335345
} // context2 disposed here
336346
} // languageModel disposed here
337347
```
348+
349+
## Summary
350+
351+
The [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel) API requires accounting for behaviors that don't exist in deterministic code. Treat every response as variable, use [LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext) to maintain conversation state, monitor context window usage so prompts aren't silently truncated, always check response status before consuming results, and dispose both the model and its contexts when you're done. Following these practices will help you build reliable, resource-efficient apps on top of on-device Phi Silica.

0 commit comments

Comments
 (0)