title	Run LLMs and other generative models using ONNX Runtime and Windows ML (Preview)
description	Learn how to use Windows Machine Learning (ML) to run local GenAI ONNX models (LLMs, speech-to-text) in your Windows apps.
ms.date	03/17/2026
ms.topic	how-to

Run LLMs and other generative models using ONNX Runtime and Windows ML (Preview)

When using Generative AI (GenAI) models like Large Language Models (LLMs) or generative speech-to-text models in Windows ML, you can use the ONNX Runtime GenAI Windows ML library with Windows ML.

Note

ONNX Runtime GenAI libraries are in 0.x preview releases and are subject to change.

The GenAI API gives you an easy, flexible and performant way of running generative models on device. It implements the generative AI loop for ONNX models, including pre and post processing, inference with ONNX Runtime, logits processing, search and sampling, KV cache management, and grammar specification for tool calling.

Supported models

The GenAI API supports various LLMs and generative speech-to-text models.

See the onnxruntime-genai GitHub page for more details.

Installation

Install the ONNX Runtime GenAI Windows ML NuGet package in your project.

Usage

See the ONNX Runtime GenAI docs for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run LLMs and other generative models using ONNX Runtime and Windows ML (Preview)

Supported models

Installation

Usage

See also

FilesExpand file tree

run-genai-onnx-models.md

Latest commit

History

run-genai-onnx-models.md

File metadata and controls

Run LLMs and other generative models using ONNX Runtime and Windows ML (Preview)

Supported models

Installation

Usage

See also