Skip to content

Commit b834953

Browse files
nmetulevCopilot
andcommitted
Fix outdated GenAI APIs and update walkthroughs to use Windows ML
- Update InferStreaming in get-started-models-genai.md to use current OnnxRuntimeGenAI 0.6+ APIs: - SetInputSequences -> generator.AppendTokenSequences (moved to Generator) - Remove ComputeLogits (handled by GenerateNextToken internally) These APIs were removed in the continuous decoding change: microsoft/onnxruntime-genai#1142 - Switch GenAI walkthrough from .DirectML to .WinML package for automatic hardware EP selection. Add alternative packages section. - Switch ONNX WinUI walkthrough from standalone ORT + SharpDX.DXGI to Windows ML (Microsoft.WindowsAppSDK.ML): - Replace manual DirectML adapter selection with EP Catalog - Remove SharpDX.DXGI dependency - Use async InitModelAsync with EnsureAndRegisterCertifiedAsync Co-authored-by: Copilot <[email protected]>
1 parent f89f68c commit b834953

2 files changed

Lines changed: 35 additions & 25 deletions

File tree

docs/models/get-started-models-genai.md

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,12 @@ In Visual Studio, create a new project. In the **Create a new project** dialog,
2727

2828
## Add references to the ONNX Runtime Generative AI Nuget package
2929

30-
In **Solution Explorer**, right-click **Dependencies** and select **Manage NuGet packages...**. In the NuGet package manager, select the **Browse** tab. Search for "Microsoft.ML.OnnxRuntimeGenAI.DirectML", select the latest stable version in the **Version** drop-down and then click **Install**.
30+
In **Solution Explorer**, right-click **Dependencies** and select **Manage NuGet packages...**. In the NuGet package manager, select the **Browse** tab. Search for "Microsoft.ML.OnnxRuntimeGenAI.WinML", select the latest stable version in the **Version** drop-down and then click **Install**.
31+
32+
This package uses [Windows ML](../new-windows-ml/overview.md) to automatically select the best available hardware execution provider (NPU → GPU → CPU). No need to choose between DirectML, QNN, or CPU-specific packages — Windows ML handles it.
33+
34+
> [!NOTE]
35+
> The `.WinML` package requires a Windows-specific target framework (e.g., `net8.0-windows10.0.19041.0` or later). If you need a cross-platform package or want to target a specific execution provider, see the [alternative packages](#alternative-genai-packages) section at the end of this article.
3136
3237

3338
## Add a model and vocabulary file to your project
@@ -133,7 +138,7 @@ private async void MainWindow_Activated(object sender, WindowActivatedEventArgs
133138

134139
Create a helper method that submits the prompt to the model and then asynchronously returns the results to the caller with an [IAsyncEnumerable](/dotnet/api/system.collections.generic.iasyncenumerable-1).
135140

136-
In this method, the [Generator](https://onnxruntime.ai/docs/genai/api/csharp.html#generator-class) class is used in a loop, calling **GenerateNextToken** in each pass to retrieve what the model predicts the next few characters, called a token, should be based on the input prompt. The loop runs until the generator **IsDone** method returns true or until any of the tokens "<|end|>", "<|system|>", or "<|user|>" are received, which signals that we can stop generating tokens.
141+
In this method, the [Generator](https://onnxruntime.ai/docs/genai/api/csharp.html#generator-class) class is used in a loop, calling **GenerateNextToken** in each pass to retrieve what the model predicts the next few characters, called a token, should be based on the input prompt. Input token sequences are provided to the generator via **AppendTokenSequences**. The loop runs until the generator **IsDone** method returns true or until any of the tokens "<|end|>", "<|system|>", or "<|user|>" are received, which signals that we can stop generating tokens.
137142

138143
```csharp
139144
public async IAsyncEnumerable<string> InferStreaming(string prompt)
@@ -148,19 +153,18 @@ public async IAsyncEnumerable<string> InferStreaming(string prompt)
148153
var sequences = tokenizer.Encode(prompt);
149154

150155
generatorParams.SetSearchOption("max_length", 2048);
151-
generatorParams.SetInputSequences(sequences);
152156
generatorParams.TryGraphCaptureWithMaxBatchSize(1);
153157

154158
using var tokenizerStream = tokenizer.CreateStream();
155159
using var generator = new Generator(model, generatorParams);
160+
generator.AppendTokenSequences(sequences);
156161
StringBuilder stringBuilder = new();
157162
while (!generator.IsDone())
158163
{
159164
string part;
160165
try
161166
{
162167
await Task.Delay(10).ConfigureAwait(false);
163-
generator.ComputeLogits();
164168
generator.GenerateNextToken();
165169
part = tokenizerStream.Decode(generator.GetSequence(0)[^1]);
166170
stringBuilder.Append(part);
@@ -216,6 +220,18 @@ private async void myButton_Click(object sender, RoutedEventArgs e)
216220

217221
In Visual Studio, in the **Solution Platforms** drop-down, make sure that the target processor is set to x64. The ONNXRuntime Generative AI library does not support x86. Build and run the project. Wait for the **TextBlock** to indicate that the model has been loaded. Type a prompt into the prompt text box and click the submit button. You should see the results gradually populate the text block.
218222

223+
## Alternative GenAI packages
224+
225+
If you need to target a specific execution provider instead of using Windows ML's automatic selection, you can use one of these packages instead of `.WinML`:
226+
227+
| Package | Use case |
228+
|---------|----------|
229+
| `Microsoft.ML.OnnxRuntimeGenAI.DirectML` | GPU-only (NVIDIA, AMD, Intel) |
230+
| `Microsoft.ML.OnnxRuntimeGenAI.QNN` | NPU-only (Qualcomm) |
231+
| `Microsoft.ML.OnnxRuntimeGenAI` | CPU-only (cross-platform) |
232+
233+
**Do not reference more than one** of these packages in the same project — they ship conflicting `onnxruntime.dll` files. The GenAI API code (Model, Tokenizer, Generator) is the same regardless of which package you use.
234+
219235
## See also
220236

221237
- [Get started with AI on Windows](../overview.md)

docs/models/get-started-onnx-winui.md

Lines changed: 15 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,15 @@ no-loc: [ONNX Runtime, ONNX Runtime Generative AI, scikit-learn, DirectML Execut
88

99
# Get started with ONNX models in your WinUI app with ONNX Runtime
1010

11-
This article walks you through creating a WinUI app that uses an ONNX model to classify objects in an image and display the confidence of each classification. For more information on using AI and machine learning models in your windows app, see [Get started with AI on Windows](../overview.md).
11+
This article walks you through creating a WinUI app that uses an ONNX model to classify objects in an image and display the confidence of each classification. For more information on using AI and machine learning models in your windows app, see [Get started with AI on Windows](../overview.md). This walkthrough uses [Windows ML](../new-windows-ml/overview.md) to automatically manage execution providers for hardware-accelerated inference.
1212

1313
When utilizing AI features, we recommend that you review: [Developing Responsible Generative AI Applications and Features on Windows](../rai.md).
1414

1515
## What is the ONNX runtime
1616

1717
ONNX Runtime is a cross-platform machine-learning model accelerator, with a flexible interface to integrate hardware-specific libraries. ONNX Runtime can be used with models from PyTorch, Tensorflow/Keras, TFLite, scikit-learn, and other frameworks. For more information, see the ONNX Runtime website at [https://onnxruntime.ai/docs/](https://onnxruntime.ai/docs/).
1818

19-
This sample uses the [DirectML Execution Provider](https://onnxruntime.ai/docs/execution-providers/DirectML-ExecutionProvider.html) which abstracts and runs across the different hardware options on Windows devices and supports execution across local accelerators, like the GPU and NPU.
19+
This sample uses [Windows ML](../new-windows-ml/overview.md) which provides a shared ONNX Runtime and dynamically downloads the best execution providers for the user's hardware. Windows ML abstracts hardware selection — your app automatically benefits from GPU, NPU, or CPU acceleration without bundling specific execution providers.
2020

2121

2222
## Prerequisites
@@ -35,17 +35,16 @@ In **Solution Explorer**, right-click **Dependencies** and select **Manage NuGet
3535

3636
| Package | Description |
3737
|---------|-------------|
38-
| Microsoft.ML.OnnxRuntime.DirectML | Provides APIs for running ONNX models on the GPU. |
38+
| Microsoft.WindowsAppSDK.ML | Provides the Windows ML runtime with the ONNX Runtime and automatic execution provider management. |
3939
| SixLabors.ImageSharp | Provides image utilities for processing images for model input. |
40-
| SharpDX.DXGI | Provides APIs for accessing the DirectX device from C#. |
4140

4241
Add the following **using** directives to the top of `MainWindows.xaml.cs` to access the APIs from these libraries.
4342

4443
```csharp
4544
// MainWindow.xaml.cs
4645
using Microsoft.ML.OnnxRuntime;
4746
using Microsoft.ML.OnnxRuntime.Tensors;
48-
using SharpDX.DXGI;
47+
using Microsoft.Windows.AI.MachineLearning;
4948
using SixLabors.ImageSharp;
5049
using SixLabors.ImageSharp.Formats;
5150
using SixLabors.ImageSharp.PixelFormats;
@@ -83,35 +82,30 @@ In the `MainWindow.xaml` file, replace the default **StackPanel** element with t
8382

8483
## Initialize the model
8584

86-
In the `MainWindow.xaml.cs` file, inside the **MainWindow** class, create a helper method called **InitModel** that will initialize the model. This method uses APIs from the **SharpDX.DXGI** library to select the first available adapter. The selected adapter is set in the [SessionOptions](https://onnxruntime.ai/docs/api/csharp/api/Microsoft.ML.OnnxRuntime.SessionOptions.html) object for the DirectML execution provider in this session. Finally, a new [InferenceSession](https://onnxruntime.ai/docs/api/csharp/api/Microsoft.ML.OnnxRuntime.InferenceSession.html) is initialized, passing in the path to the model file and the session options.
85+
In the `MainWindow.xaml.cs` file, inside the **MainWindow** class, create a helper method called **InitModel** that will initialize the model. This method uses [Windows ML](../new-windows-ml/overview.md) to automatically download and register the best available execution providers for the user's hardware. The [ExecutionProviderCatalog](../new-windows-ml/get-started.md) handles hardware detection and EP selection — no need to manually choose a GPU adapter. Finally, a new [InferenceSession](https://onnxruntime.ai/docs/api/csharp/api/Microsoft.ML.OnnxRuntime.InferenceSession.html) is initialized, passing in the path to the model file.
8786

8887
```csharp
8988
// MainWindow.xaml.cs
9089
9190
private InferenceSession _inferenceSession;
9291
private string modelDir = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "model");
9392

94-
private void InitModel()
93+
private async Task InitModelAsync()
9594
{
9695
if (_inferenceSession != null)
9796
{
9897
return;
9998
}
10099

101-
// Select a graphics device
102-
var factory1 = new Factory1();
103-
int deviceId = 0;
104-
105-
Adapter1 selectedAdapter = factory1.GetAdapter1(0);
106-
107-
// Create the inference session
108-
var sessionOptions = new SessionOptions
100+
// Use Windows ML to download and register the best execution providers
101+
var catalog = ExecutionProviderCatalog.GetDefault();
102+
if (catalog is not null)
109103
{
110-
LogSeverityLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_INFO
111-
};
112-
sessionOptions.AppendExecutionProvider_DML(deviceId);
113-
_inferenceSession = new InferenceSession($@"{modelDir}\resnet50-v2-7.onnx", sessionOptions);
104+
await catalog.EnsureAndRegisterCertifiedAsync();
105+
}
114106

107+
// Create the inference session — uses registered EPs automatically
108+
_inferenceSession = new InferenceSession($@"{modelDir}\resnet50-v2-7.onnx");
115109
}
116110
```
117111

@@ -207,13 +201,13 @@ Next, we set up the inputs by creating an [OrtValue](https://onnxruntime.ai/docs
207201
};
208202
```
209203

210-
Next, if the inference session hasn't been initialized yet, call out **InitModel** helper method. Then call the **Run** method to run the model and retrieve the results.
204+
Next, if the inference session hasn't been initialized yet, call the **InitModelAsync** helper method. Then call the **Run** method to run the model and retrieve the results.
211205

212206
```csharp
213207
// Run inference
214208
if (_inferenceSession == null)
215209
{
216-
InitModel();
210+
await InitModelAsync();
217211
}
218212
using var runOptions = new RunOptions();
219213
using IDisposableReadOnlyCollection<OrtValue> results = _inferenceSession.Run(runOptions, inputs, _inferenceSession.OutputNames);

0 commit comments

Comments
 (0)