Fix outdated GenAI APIs and update walkthroughs to use Windows ML

nmetulev · Copilot · nmetulev · commit b8349530f71f · 2026-04-09T22:50:35.000-07:00
- Update InferStreaming in get-started-models-genai.md to use current OnnxRuntimeGenAI 0.6+ APIs: - SetInputSequences -> generator.AppendTokenSequences (moved to Generator) - Remove ComputeLogits (handled by GenerateNextToken internally) These APIs were removed in the continuous decoding change: microsoft/onnxruntime-genai#1142 - Switch GenAI walkthrough from .DirectML to .WinML package for automatic hardware EP selection. Add alternative packages section. - Switch ONNX WinUI walkthrough from standalone ORT + SharpDX.DXGI to Windows ML (Microsoft.WindowsAppSDK.ML): - Replace manual DirectML adapter selection with EP Catalog - Remove SharpDX.DXGI dependency - Use async InitModelAsync with EnsureAndRegisterCertifiedAsync Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
diff --git a/docs/models/get-started-models-genai.md b/docs/models/get-started-models-genai.md
@@ -27,7 +27,12 @@ In Visual Studio, create a new project. In the **Create a new project** dialog,
 
 ## Add references to the ONNX Runtime Generative AI Nuget package
 
-In **Solution Explorer**, right-click **Dependencies** and select **Manage NuGet packages...**. In the NuGet package manager, select the **Browse** tab. Search for "Microsoft.ML.OnnxRuntimeGenAI.DirectML", select the latest stable version in the **Version** drop-down and then click **Install**.
+In **Solution Explorer**, right-click **Dependencies** and select **Manage NuGet packages...**. In the NuGet package manager, select the **Browse** tab. Search for "Microsoft.ML.OnnxRuntimeGenAI.WinML", select the latest stable version in the **Version** drop-down and then click **Install**.
+
+This package uses [Windows ML](../new-windows-ml/overview.md) to automatically select the best available hardware execution provider (NPU → GPU → CPU). No need to choose between DirectML, QNN, or CPU-specific packages — Windows ML handles it.
+
+> [!NOTE]
+> The `.WinML` package requires a Windows-specific target framework (e.g., `net8.0-windows10.0.19041.0` or later). If you need a cross-platform package or want to target a specific execution provider, see the [alternative packages](#alternative-genai-packages) section at the end of this article.
 
 
 ## Add a model and vocabulary file to your project
@@ -133,7 +138,7 @@ private async void MainWindow_Activated(object sender, WindowActivatedEventArgs
 
 Create a helper method that submits the prompt to the model and then asynchronously returns the results to the caller with an [IAsyncEnumerable](/dotnet/api/system.collections.generic.iasyncenumerable-1). 
 
-In this method, the [Generator](https://onnxruntime.ai/docs/genai/api/csharp.html#generator-class) class is used in a loop, calling **GenerateNextToken** in each pass to retrieve what the model predicts the next few characters, called a token, should be based on the input prompt. The loop runs until the generator **IsDone** method returns true or until any of the tokens "<|end|>", "<|system|>", or "<|user|>" are received, which signals that we can stop generating tokens.
+In this method, the [Generator](https://onnxruntime.ai/docs/genai/api/csharp.html#generator-class) class is used in a loop, calling **GenerateNextToken** in each pass to retrieve what the model predicts the next few characters, called a token, should be based on the input prompt. Input token sequences are provided to the generator via **AppendTokenSequences**. The loop runs until the generator **IsDone** method returns true or until any of the tokens "<|end|>", "<|system|>", or "<|user|>" are received, which signals that we can stop generating tokens.
 
 ```csharp
 public async IAsyncEnumerable<string> InferStreaming(string prompt)
@@ -148,19 +153,18 @@ public async IAsyncEnumerable<string> InferStreaming(string prompt)
     var sequences = tokenizer.Encode(prompt);
 
     generatorParams.SetSearchOption("max_length", 2048);
-    generatorParams.SetInputSequences(sequences);
     generatorParams.TryGraphCaptureWithMaxBatchSize(1);
 
     using var tokenizerStream = tokenizer.CreateStream();
     using var generator = new Generator(model, generatorParams);
+    generator.AppendTokenSequences(sequences);
     StringBuilder stringBuilder = new();
     while (!generator.IsDone())
     {
         string part;
         try
         {
             await Task.Delay(10).ConfigureAwait(false);
-            generator.ComputeLogits();
             generator.GenerateNextToken();
             part = tokenizerStream.Decode(generator.GetSequence(0)[^1]);
             stringBuilder.Append(part);
@@ -216,6 +220,18 @@ private async void myButton_Click(object sender, RoutedEventArgs e)
 
 In Visual Studio, in the **Solution Platforms** drop-down, make sure that the target processor is set to x64. The ONNXRuntime Generative AI library does not support x86. Build and run the project. Wait for the **TextBlock** to indicate that the model has been loaded. Type a prompt into the prompt text box and click the submit button. You should see the results gradually populate the text block.
 
+## Alternative GenAI packages
+
+If you need to target a specific execution provider instead of using Windows ML's automatic selection, you can use one of these packages instead of `.WinML`:
+
+| Package | Use case |
+|---------|----------|
+| `Microsoft.ML.OnnxRuntimeGenAI.DirectML` | GPU-only (NVIDIA, AMD, Intel) |
+| `Microsoft.ML.OnnxRuntimeGenAI.QNN` | NPU-only (Qualcomm) |
+| `Microsoft.ML.OnnxRuntimeGenAI` | CPU-only (cross-platform) |
+
+**Do not reference more than one** of these packages in the same project — they ship conflicting `onnxruntime.dll` files. The GenAI API code (Model, Tokenizer, Generator) is the same regardless of which package you use.
+
 ## See also
 
 - [Get started with AI on Windows](../overview.md)
diff --git a/docs/models/get-started-onnx-winui.md b/docs/models/get-started-onnx-winui.md
@@ -8,15 +8,15 @@ no-loc: [ONNX Runtime, ONNX Runtime Generative AI, scikit-learn, DirectML Execut
 
 # Get started with ONNX models in your WinUI app with ONNX Runtime
 
-This article walks you through creating a WinUI app that uses an ONNX model to classify objects in an image and display the confidence of each classification. For more information on using AI and machine learning models in your windows app, see [Get started with AI on Windows](../overview.md).
+This article walks you through creating a WinUI app that uses an ONNX model to classify objects in an image and display the confidence of each classification. For more information on using AI and machine learning models in your windows app, see [Get started with AI on Windows](../overview.md). This walkthrough uses [Windows ML](../new-windows-ml/overview.md) to automatically manage execution providers for hardware-accelerated inference.
 
 When utilizing AI features, we recommend that you review: [Developing Responsible Generative AI Applications and Features on Windows](../rai.md).
 
 ## What is the ONNX runtime
 
 ONNX Runtime is a cross-platform machine-learning model accelerator, with a flexible interface to integrate hardware-specific libraries. ONNX Runtime can be used with models from PyTorch, Tensorflow/Keras, TFLite, scikit-learn, and other frameworks. For more information, see the ONNX Runtime website at [https://onnxruntime.ai/docs/](https://onnxruntime.ai/docs/). 
 
-This sample uses the [DirectML Execution Provider](https://onnxruntime.ai/docs/execution-providers/DirectML-ExecutionProvider.html) which abstracts and runs across the different hardware options on Windows devices and supports execution across local accelerators, like the GPU and NPU.
+This sample uses [Windows ML](../new-windows-ml/overview.md) which provides a shared ONNX Runtime and dynamically downloads the best execution providers for the user's hardware. Windows ML abstracts hardware selection — your app automatically benefits from GPU, NPU, or CPU acceleration without bundling specific execution providers.
 
 
 ## Prerequisites
@@ -35,17 +35,16 @@ In **Solution Explorer**, right-click **Dependencies** and select **Manage NuGet
 
 | Package | Description |
 |---------|-------------|
-| Microsoft.ML.OnnxRuntime.DirectML | Provides APIs for running ONNX models on the GPU. |
+| Microsoft.WindowsAppSDK.ML | Provides the Windows ML runtime with the ONNX Runtime and automatic execution provider management. |
 | SixLabors.ImageSharp | Provides image utilities for processing images for model input. |
-| SharpDX.DXGI | Provides APIs for accessing the DirectX device from C#. |
 
 Add the following **using** directives to the top of `MainWindows.xaml.cs` to access the APIs from these libraries.
 
 ```csharp
 // MainWindow.xaml.cs
 using Microsoft.ML.OnnxRuntime;
 using Microsoft.ML.OnnxRuntime.Tensors;
-using SharpDX.DXGI;
+using Microsoft.Windows.AI.MachineLearning;
 using SixLabors.ImageSharp;
 using SixLabors.ImageSharp.Formats;
 using SixLabors.ImageSharp.PixelFormats;
@@ -83,35 +82,30 @@ In the `MainWindow.xaml` file, replace the default **StackPanel** element with t
 
 ## Initialize the model
 
-In the `MainWindow.xaml.cs` file, inside the **MainWindow** class, create a helper method called **InitModel** that will initialize the model. This method uses APIs from the **SharpDX.DXGI** library to select the first available adapter. The selected adapter is set in the [SessionOptions](https://onnxruntime.ai/docs/api/csharp/api/Microsoft.ML.OnnxRuntime.SessionOptions.html) object for the DirectML execution provider in this session. Finally, a new [InferenceSession](https://onnxruntime.ai/docs/api/csharp/api/Microsoft.ML.OnnxRuntime.InferenceSession.html) is initialized, passing in the path to the model file and the session options.
+In the `MainWindow.xaml.cs` file, inside the **MainWindow** class, create a helper method called **InitModel** that will initialize the model. This method uses [Windows ML](../new-windows-ml/overview.md) to automatically download and register the best available execution providers for the user's hardware. The [ExecutionProviderCatalog](../new-windows-ml/get-started.md) handles hardware detection and EP selection — no need to manually choose a GPU adapter. Finally, a new [InferenceSession](https://onnxruntime.ai/docs/api/csharp/api/Microsoft.ML.OnnxRuntime.InferenceSession.html) is initialized, passing in the path to the model file.
 
 ```csharp
 // MainWindow.xaml.cs
 
 private InferenceSession _inferenceSession;
 private string modelDir = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "model");
 
-private void InitModel()
+private async Task InitModelAsync()
 {
     if (_inferenceSession != null)
     {
         return;
     }
 
-    // Select a graphics device
-    var factory1 = new Factory1();
-    int deviceId = 0;
-
-    Adapter1 selectedAdapter = factory1.GetAdapter1(0);
-
-    // Create the inference session
-    var sessionOptions = new SessionOptions
+    // Use Windows ML to download and register the best execution providers
+    var catalog = ExecutionProviderCatalog.GetDefault();
+    if (catalog is not null)
     {
-        LogSeverityLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_INFO
-    };
-    sessionOptions.AppendExecutionProvider_DML(deviceId);
-    _inferenceSession = new InferenceSession($@"{modelDir}\resnet50-v2-7.onnx", sessionOptions);
+        await catalog.EnsureAndRegisterCertifiedAsync();
+    }
 
+    // Create the inference session — uses registered EPs automatically
+    _inferenceSession = new InferenceSession($@"{modelDir}\resnet50-v2-7.onnx");
 }
 ```
 
@@ -207,13 +201,13 @@ Next, we set up the inputs by creating an [OrtValue](https://onnxruntime.ai/docs
     };
 ```
 
-Next, if the inference session hasn't been initialized yet, call out **InitModel** helper method. Then call the **Run** method to run the model and retrieve the results.
+Next, if the inference session hasn't been initialized yet, call the **InitModelAsync** helper method. Then call the **Run** method to run the model and retrieve the results.
 
 ```csharp
     // Run inference
     if (_inferenceSession == null)
     {
-        InitModel();
+        await InitModelAsync();
     }
     using var runOptions = new RunOptions();
     using IDisposableReadOnlyCollection<OrtValue> results = _inferenceSession.Run(runOptions, inputs, _inferenceSession.OutputNames);