Skip to content

Latest commit

 

History

History
42 lines (28 loc) · 2.05 KB

File metadata and controls

42 lines (28 loc) · 2.05 KB
title Run LLMs and other generative models using ONNX Runtime and Windows ML (Preview)
description Learn how to use Windows Machine Learning (ML) to run local GenAI ONNX models (LLMs, speech-to-text) in your Windows apps.
ms.date 03/17/2026
ms.topic how-to

Run LLMs and other generative models using ONNX Runtime and Windows ML (Preview)

When using Generative AI (GenAI) models like Large Language Models (LLMs) or generative speech-to-text models in Windows ML, you can use the ONNX Runtime GenAI Windows ML library with Windows ML.

Note

ONNX Runtime GenAI libraries are in 0.x preview releases and are subject to change.

The GenAI API gives you an easy, flexible and performant way of running generative models on device. It implements the generative AI loop for ONNX models, including pre and post processing, inference with ONNX Runtime, logits processing, search and sampling, KV cache management, and grammar specification for tool calling.

Supported models

The GenAI API supports various LLMs and generative speech-to-text models.

See the onnxruntime-genai GitHub page for more details.

Installation

Install the ONNX Runtime GenAI Windows ML NuGet package in your project.

Usage

See the ONNX Runtime GenAI docs for more details.

See also