Fix outdated GenAI APIs and add Windows ML guidance

nmetulev · Copilot · nmetulev · commit a3347adf473f · 2026-04-09T22:18:57.000-07:00
- Update InferStreaming in get-started-models-genai.md to use current OnnxRuntimeGenAI 0.6+ APIs: - SetInputSequences -> generator.AppendTokenSequences (moved to Generator) - Remove ComputeLogits (handled by GenerateNextToken internally) These APIs were removed in the continuous decoding change: microsoft/onnxruntime-genai#1142 - Add tip recommending .WinML package for automatic EP selection in the GenAI walkthrough - Add tip in ONNX WinUI walkthrough pointing to Windows ML as the recommended path for new apps (shared runtime + auto EP management) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
diff --git a/docs/models/get-started-models-genai.md b/docs/models/get-started-models-genai.md
@@ -29,6 +29,9 @@ In Visual Studio, create a new project. In the **Create a new project** dialog,
 
 In **Solution Explorer**, right-click **Dependencies** and select **Manage NuGet packages...**. In the NuGet package manager, select the **Browse** tab. Search for "Microsoft.ML.OnnxRuntimeGenAI.DirectML", select the latest stable version in the **Version** drop-down and then click **Install**.
 
+> [!TIP]
+> For new Windows apps, consider using the [Microsoft.ML.OnnxRuntimeGenAI.WinML](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntimeGenAI.WinML/) package instead. It uses Windows ML to automatically select the best hardware (NPU → GPU → CPU) without you needing to choose a specific execution provider package. See [Run GenAI models with Windows ML](../new-windows-ml/run-genai-onnx-models.md) for more info.
+
 
 ## Add a model and vocabulary file to your project
 
@@ -133,7 +136,7 @@ private async void MainWindow_Activated(object sender, WindowActivatedEventArgs
 
 Create a helper method that submits the prompt to the model and then asynchronously returns the results to the caller with an [IAsyncEnumerable](/dotnet/api/system.collections.generic.iasyncenumerable-1). 
 
-In this method, the [Generator](https://onnxruntime.ai/docs/genai/api/csharp.html#generator-class) class is used in a loop, calling **GenerateNextToken** in each pass to retrieve what the model predicts the next few characters, called a token, should be based on the input prompt. The loop runs until the generator **IsDone** method returns true or until any of the tokens "<|end|>", "<|system|>", or "<|user|>" are received, which signals that we can stop generating tokens.
+In this method, the [Generator](https://onnxruntime.ai/docs/genai/api/csharp.html#generator-class) class is used in a loop, calling **GenerateNextToken** in each pass to retrieve what the model predicts the next few characters, called a token, should be based on the input prompt. Input token sequences are provided to the generator via **AppendTokenSequences**. The loop runs until the generator **IsDone** method returns true or until any of the tokens "<|end|>", "<|system|>", or "<|user|>" are received, which signals that we can stop generating tokens.
 
 ```csharp
 public async IAsyncEnumerable<string> InferStreaming(string prompt)
@@ -148,19 +151,18 @@ public async IAsyncEnumerable<string> InferStreaming(string prompt)
     var sequences = tokenizer.Encode(prompt);
 
     generatorParams.SetSearchOption("max_length", 2048);
-    generatorParams.SetInputSequences(sequences);
     generatorParams.TryGraphCaptureWithMaxBatchSize(1);
 
     using var tokenizerStream = tokenizer.CreateStream();
     using var generator = new Generator(model, generatorParams);
+    generator.AppendTokenSequences(sequences);
     StringBuilder stringBuilder = new();
     while (!generator.IsDone())
     {
         string part;
         try
         {
             await Task.Delay(10).ConfigureAwait(false);
-            generator.ComputeLogits();
             generator.GenerateNextToken();
             part = tokenizerStream.Decode(generator.GetSequence(0)[^1]);
             stringBuilder.Append(part);
diff --git a/docs/models/get-started-onnx-winui.md b/docs/models/get-started-onnx-winui.md
@@ -10,6 +10,9 @@ no-loc: [ONNX Runtime, ONNX Runtime Generative AI, scikit-learn, DirectML Execut
 
 This article walks you through creating a WinUI app that uses an ONNX model to classify objects in an image and display the confidence of each classification. For more information on using AI and machine learning models in your windows app, see [Get started with AI on Windows](../overview.md).
 
+> [!TIP]
+> For new Windows apps, consider using [Windows ML](../new-windows-ml/overview.md) instead of bundling the ONNX Runtime directly. Windows ML provides a shared system-wide ONNX Runtime and automatically downloads the best execution providers for the user's hardware (GPU, NPU, CPU). See [Get started with Windows ML](../new-windows-ml/get-started.md) for setup instructions. This walkthrough demonstrates the standalone ONNX Runtime approach, which gives you full control over the runtime version and execution providers.
+
 When utilizing AI features, we recommend that you review: [Developing Responsible Generative AI Applications and Features on Windows](../rai.md).
 
 ## What is the ONNX runtime