You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/apis/language-model-best-practices.md
+65-51Lines changed: 65 additions & 51 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,13 @@
1
1
---
2
-
title: Language model best practices
3
-
description:
2
+
title: Best practices for the Phi Silica LanguageModel API
3
+
description: Learn best practices for the Phi Silica LanguageModel API in the Windows App SDK, including handling non-deterministic output, managing context for multi-turn conversations, and disposing resources.
4
4
ms.topic: overview
5
5
ms.date: 03/26/2026
6
6
---
7
7
8
-
# Language model best practices
8
+
# Best practices for the Phi Silica LanguageModel API
9
9
10
-
This topic provides developer guidance and describes various best practices for the [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel)APIs supported by [Phi Silica](phi-silica.md). It covers both the API functionality and the develpoer requirements for incorporating the supported features into a Windows app.
10
+
The [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel)API gives your Windows app access to the on-device [Phi Silica](phi-silica.md) model, but working with a language model introduces behaviors that differ from traditional deterministic code. This topic covers the key areas you need to handle with each section providing guidance and code samples for the [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel) API.
Each call to `GenerateResponseAsync` without a `LanguageModelContext` is stateless. The model has
118
-
no memory of prior prompts or responses. To build a multi-turn conversation, you must create and
119
-
pass a `LanguageModelContext`.
117
+
Each call to [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync) without a [LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext) is stateless. The model has no memory of prior prompts or responses. To build a multi-turn conversation, you must create and pass a [LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext).
120
118
121
119
### How context works
122
120
123
-
-`CreateContext()` or `CreateContext(systemPrompt)` returns a `LanguageModelContext` that
124
-
accumulates conversation history.
125
-
- When you pass a context to `GenerateResponseAsync`, the call modifies the context in-place,
126
-
appending both the prompt and the response to the conversation history.
127
-
- The system prompt, set at context creation time, guides model behavior for the entire
128
-
conversation.
121
+
-[CreateContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.createcontext) returns a [LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext) that accumulates conversation history.
122
+
- When you pass a context to [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync), the call modifies the context in-place, appending both the prompt and the response to the conversation history.
123
+
- The system prompt, set at context creation time, guides model behavior for the entire conversation.
129
124
130
125
### Guidance
131
126
132
127
- Create a context with a system prompt to set the model's role and behavioral boundaries.
133
-
- Pass the same context to every `GenerateResponseAsync` call within a conversation.
134
-
-`LanguageModelContext` implements `IClosable` — dispose it when the conversation ends.
128
+
- Pass the same context to every [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync) call within a conversation.
129
+
-[LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext) implements [IClosable](/uwp/api/windows.foundation.iclosable) — dispose it when the conversation ends.
135
130
- If content moderation blocks a prompt or response, the context state is unspecified. Consider
136
131
creating a new context after a moderation block.
137
132
138
133
### Proper multi-turn conversation
139
134
135
+
The following snippet shows a three-turn conversation where each call passes the same `context` object. Because the context accumulates history, the model can resolve references like "that" and "instead" back to earlier turns (the second question builds on the first answer and the third builds on both). Each response is checked for `Complete` status before use.
The following snippet shows what happens when you omit the context object. Each call to [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync) starts from scratch, so the model has no way to connect "ratio" in the second prompt back to the roux discussed in the first. The result is an incoherent conversation.
167
172
168
173
```c#
169
174
// DO NOT do this for multi-turn conversations
@@ -175,30 +180,24 @@ var result2 = await languageModel.GenerateResponseAsync("What ratio should I use
175
180
176
181
## Managing Context Length
177
182
178
-
The model has a finite context window. The API does not automatically truncate or summarize
179
-
conversation history. Context length management is the developer's responsibility.
183
+
The model has a finite context window. The API does not automatically truncate or summarize conversation history. Context length management is the developer's responsibility.
180
184
181
-
Context is consumed by the system prompt, accumulated conversation history (all prior prompts
182
-
and responses), and the current prompt. As conversations grow, the remaining space for new
183
-
prompts shrinks.
185
+
Every [LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext) has a finite context window. As the API does not automatically truncate or summarize conversation history, managing context length is up to you.
184
186
185
-
### Key API: `GetUsablePromptLength`
187
+
The context window is shared by the system prompt, all accumulated conversation history (prior prompts and responses), and the current prompt. As conversations grow, the remaining space for new prompts shrinks.
186
188
187
-
`GetUsablePromptLength(context, prompt)` returns a character index into the prompt string
188
-
indicating where the context window ran out of space. If the return value equals the prompt
189
-
length, the entire prompt fits.
189
+
Before sending a prompt, call [GetUsablePromptLength(context, prompt)](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.getusablepromptlength) to find out how much of your prompt actually fits. The method returns a character index into the prompt string that if the index equals the prompt's length, the entire prompt fits within the remaining context window. If it's less, only the characters up to that index can be accepted, and you'll need to trim, rephrase, or reset the context before calling [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync).
190
190
191
-
### Strategies
191
+
### Strategies for staying within the context window
192
192
193
-
1.**Check before sending** — call `GetUsablePromptLength` before each `GenerateResponseAsync`.
194
-
2.**Trim the prompt** — if the return value is less than the prompt length, truncate or
195
-
rephrase the prompt to fit.
196
-
3.**Reset context** — when history fills up, create a new context. Optionally carry forward a
197
-
summary of the conversation as the system prompt for the new context.
198
-
4.**Handle `PromptLargerThanContext`** — always check `result.Status` and handle this status
199
-
gracefully.
193
+
1.**Check before sending** — call [GetUsablePromptLength](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.getusablepromptlength) before each [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync) call to confirm the prompt fits.
194
+
2.**Trim the prompt** — if the return value is less than the prompt length, truncate or rephrase the prompt to fit the remaining window.
195
+
3.**Reset context** — when history fills up, create a new context. Optionally summarize the conversation so far and carry the summary forward as the system prompt for the new context.
196
+
4.**Handle `PromptLargerThanContext`** — always check `result.Status`. If the status is `PromptLargerThanContext`, trim the prompt or reset the context as described above.
200
197
201
-
### Pre-send length check with trimming
198
+
### Trim the prompt to fit the context window
199
+
200
+
The following example calls [GetUsablePromptLength](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.getusablepromptlength) to determine how much of the prompt fits, truncates it if necessary, and then sends it. If `usableLength` is zero (the context is completely full), this method sends an empty prompt; see the next section for how to handle that case by resetting the context.
When the context window has no remaining space, you can't send another prompt without first freeing room. One approach is to ask the model to summarize the conversation so far, dispose the old context, and create a fresh one seeded with a system prompt that includes the summary. This preserves conversational continuity without carrying the full history forward. Note that the summary itself is generated by the model and is subject to the same non-determinism described earlier.
Always check `result.Status` before using `result.Text`. A non-`Complete` status means the
256
-
text may be empty or incomplete.
262
+
Every call to [GenerateResponseAsync](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel.generateresponseasync) returns a [LanguageModelResponseResult](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelresponseresult) whose `Status` property tells you whether the response completed successfully. Always check `Status` before reading `Text` as a non-`Complete` status means the text may be empty, incomplete, or absent entirely. The following table lists each possible status value and recommended handling.
257
263
258
264
| Status | Meaning | Recommended handling |
259
-
|-|-|-|
265
+
| ---| --- | --- |
260
266
|`Complete`| Full response generated successfully | Use `result.Text`|
261
267
|`InProgress`| Generation is still running | Wait for completion via the async operation |
262
268
|`BlockedByPolicy`| Generative AI blocked by system policy | Inform the user that the feature is unavailable |
@@ -265,6 +271,8 @@ text may be empty or incomplete.
265
271
|`ResponseBlockedByContentModeration`| Output blocked by content moderation | Inform the user the response was filtered; consider rephrasing |
266
272
|`Error`| An error occurred | Check `result.ExtendedError` for details |
267
273
274
+
The following snippet demonstrates a helper method that handles every status. The `InProgress` status is omitted because `GenerateResponseAsync` returns only after generation finishes (it applies only when using streaming APIs).
Both `LanguageModel` and `LanguageModelContext` implement `IClosable`. Failing to dispose them
305
-
can leak native resources.
316
+
Both [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel) and [LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext) implement [IClosable](/uwp/api/windows.foundation.iclosable) and hold native resources that are not reclaimed by garbage collection alone.
306
317
307
-
### Guidance
318
+
- Use `using` statements (or call `Dispose()` explicitly) to release them promptly.
319
+
- Create a single `LanguageModel` instance and reuse it across calls, don't create a new one for each request.
320
+
- Dispose each `LanguageModelContext` when its conversation ends, not after every individual call.
308
321
309
-
- Use `using` statements or call `Dispose()` explicitly.
310
-
- Create one `LanguageModel` instance and reuse it across calls. Do not create a new instance
311
-
for each request.
312
-
- Dispose `LanguageModelContext` when its conversation ends, not after every call.
322
+
The following snippet demonstrates both patterns: a `using` declaration for the `LanguageModel` (disposed at method exit) and `using` blocks for each context (disposed when the conversation is done). Status checks are omitted for brevity, see [Check response status before using results](#check-response-status-before-using-results) for that pattern.
313
323
314
324
```c#
315
325
usingMicrosoft.Windows.AI.Text;
@@ -335,3 +345,7 @@ async Task Example()
335
345
} // context2 disposed here
336
346
} // languageModel disposed here
337
347
```
348
+
349
+
## Summary
350
+
351
+
The [LanguageModel](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodel) API requires accounting for behaviors that don't exist in deterministic code. Treat every response as variable, use [LanguageModelContext](/windows/windows-app-sdk/api/winrt/microsoft.windows.ai.text.languagemodelcontext) to maintain conversation state, monitor context window usage so prompts aren't silently truncated, always check response status before consuming results, and dispose both the model and its contexts when you're done. Following these practices will help you build reliable, resource-efficient apps on top of on-device Phi Silica.
0 commit comments