Skip to content

Commit f462a48

Browse files
add new article
1 parent 0065ad5 commit f462a48

2 files changed

Lines changed: 214 additions & 0 deletions

File tree

articles/container-apps/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,8 @@ items:
185185
href: gpu-image-generation.md
186186
- name: Deploy an NVIDIA Llama3 NIM
187187
href: serverless-gpu-nim.md
188+
- name: Deploy OpenAI GPT with OSS Ollama
189+
href: deploy-openai-gpt-oss-ollama.md
188190
- name: Dynamic sessions
189191
items:
190192
- name: Overview
Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
---
2+
title: Deploy OpenAI gpt-oss models with Ollama on Azure Container Apps serverless GPUs
3+
description: "Learn how to deploy and run OpenAIs open-source gpt-oss-120b and gpt-oss-20b language models using Ollama on Azure Container Apps with serverless GPU support."
4+
#customer intent: As a developer, I want to deploy OpenAI's gpt-oss models on Azure Container Apps so that I can leverage serverless GPUs for scalable AI workloads.
5+
author: craigshoemaker
6+
ms.author: cshoe
7+
ms.reviewer: cshoe
8+
ms.service: azure-container-apps
9+
ms.collection: ce-skilling-ai-copilot
10+
ms.topic: tutorial
11+
ms.date: 12/11/2025
12+
---
13+
14+
# Deploy OpenAI gpt-oss models with Ollama on Azure Container Apps serverless GPUs
15+
16+
OpenAI recently announced the release of [gpt-oss-120b and gpt-oss-20b](https://openai.com/index/introducing-gpt-oss/), two new state-of-the-art open-weight language models designed to run on lighter weight GPU resources. These models make powerful language capabilities highly accessible for developers who want to self-host language models within their own environments.
17+
18+
This article shows you how to deploy these models by using [Azure Container Apps serverless GPUs](./gpu-serverless-overview.md) with Ollama, providing a cost-efficient and scalable platform with minimal infrastructure overhead.
19+
20+
## Learning objectives
21+
22+
By the end of this article, you'll be able to:
23+
24+
- Use Azure Container Apps serverless GPUs for AI workloads
25+
- Choose the right gpt-oss model for your needs
26+
- Deploy an Ollama container on Azure Container Apps with GPU support
27+
- Configure and interact with deployed models
28+
- Call model APIs from external applications
29+
30+
## Prerequisites
31+
32+
- An Azure subscription. If you don't have one, [create a free account](https://azure.microsoft.com/pricing/purchase-options/azure-account?cid=msft_learn).
33+
- Quota for serverless GPUs in Azure Container Apps. If you don't have quota, [request a GPU quota](gpu-serverless-overview.md#request-serverless-gpu-quota).
34+
- Basic understanding of containers and Azure services
35+
- Familiarity with command-line interface
36+
37+
## What are Azure Container Apps serverless GPUs?
38+
39+
Azure Container Apps is a fully managed, serverless container platform that simplifies the deployment and operation of containerized applications. By using serverless GPU support, you can bring your own containers and deploy them to GPU-backed environments that automatically scale based on demand.
40+
41+
### Key benefits
42+
43+
- **Autoscaling**: Scale to zero when idle, scale out based on demand.
44+
- **Pay-per-second billing**: Pay only for the compute you use.
45+
- **Ease of use**: Accelerate developer velocity and easily bring any container to run on GPUs in the cloud.
46+
- **No infrastructure management**: Focus on your model and application.
47+
- **Enterprise-grade features**: Built-in support for virtual networks, managed identity, private endpoints, and full data governance.
48+
49+
## Choose the right gpt-oss model
50+
51+
The [gpt-oss models](https://openai.com/index/introducing-gpt-oss/) deliver strong performance across common language benchmarks and are optimized for different use cases:
52+
53+
| Model | Performance | Use cases | Recommended GPU |
54+
|-------|-------------|-----------|-----------------|
55+
| gpt-oss-120b | Comparable to OpenAI's gpt-4o-mini | High-performance reasoning workloads | A100 |
56+
| gpt-oss-20b | Comparable to gpt-o3-mini | Lightweight applications, cost-effective small language model (SLM) apps | T4 or A100 |
57+
58+
### Regional availability
59+
60+
Choose your deployment region based on the model you want to use and GPU availability:
61+
62+
| Region | A100 | T4 |
63+
| --- | --- | --- |
64+
| West US || |
65+
| West US 3 |||
66+
| Sweden Central |||
67+
| Australia East |||
68+
| West Europe | ||
69+
70+
> [!NOTE]
71+
> To run the 120 billion parameter model, select one of the A100 regions. To run the 20 billion parameter model, select either a T4 or A100 region.
72+
73+
## Deploy your container app
74+
75+
### Step 1: Create the container app resource
76+
77+
1. Go to the [Azure portal](https://portal.azure.com/).
78+
79+
1. Select **Create a resource**.
80+
81+
1. Search for **Container Apps**.
82+
83+
1. Select **Container App** and then select **Create**.
84+
85+
1. On the **Basics** tab, configure the following settings:
86+
- Keep most default values.
87+
- For **Region**, select a region that supports your chosen model based on the regional availability table.
88+
89+
### Step 2: Configure container settings
90+
91+
1. Select the **Container** tab.
92+
93+
1. Configure the Ollama container settings:
94+
95+
| Field | Value |
96+
| --- | --- |
97+
| **Image source** | Docker Hub or other registries |
98+
| **Image type** | Public |
99+
| **Registry login server** | docker.io |
100+
| **Image and tag** | ollama/ollama:latest |
101+
| **Workload profile** | Consumption |
102+
| **GPU** | ✅ (check the box) |
103+
| **GPU type** | A100 for gpt-oss:120b<br>T4 or A100 for gpt-oss:20b |
104+
105+
> [!IMPORTANT]
106+
> By default, pay-as-you-go and EA customers have quota. If you don't have quota for serverless GPUs in Azure Container Apps, [request a GPU quota](gpu-serverless-overview.md#request-serverless-gpu-quota).
107+
108+
### Step 3: Configure ingress
109+
110+
1. Select the **Ingress** tab.
111+
112+
1. Configure the following settings:
113+
114+
| Field | Value |
115+
| --- | --- |
116+
| **Ingress** | Enabled |
117+
| **Ingress traffic** | Accepting traffic from anywhere |
118+
| **Target port** | 11434 |
119+
120+
1. Select **Review + Create** at the bottom of the page.
121+
122+
1. Select **Create** to deploy your container app.
123+
124+
## Deploy and use your gpt-oss model
125+
126+
### Step 1: Access your deployed application
127+
128+
1. Once your deployment is complete, select **Go to resource**.
129+
130+
1. Note the **Application URL** for your container app. You use this URL later for API calls.
131+
132+
### Step 2: Pull and run the model
133+
134+
> [!TIP]
135+
> Console commands in the container app aren't counted as traffic for the container app to stay scaled out, so your application might scale back in after a set period. If you want the container app to remain active for a longer duration, go to **Application** > **Scaling** and set the minimum replica count to 1 or increase the cooldown period duration. Remember to reset the minimum replica count to 0 when not in use to avoid ongoing billing.
136+
137+
1. In the Azure portal, select the **Monitoring** dropdown, and then select **Console**.
138+
139+
1. Under **Choose start up command**, select **Connect**.
140+
141+
1. Pull the gpt-oss model by running the following command. Use `120b` or `20b` depending on which model you want to run:
142+
143+
```bash
144+
ollama pull gpt-oss:120b
145+
```
146+
147+
1. Run the gpt-oss model:
148+
149+
```bash
150+
ollama run gpt-oss:120b
151+
```
152+
153+
1. Test the model with a sample prompt:
154+
155+
```
156+
Can you explain LLMs and recent developments in AI the last few years?
157+
```
158+
159+
You successfully deployed and ran an OpenAI gpt-oss model on Azure Container Apps serverless GPUs.
160+
161+
## (Optional) Call the API from external applications
162+
163+
You can interact with your deployed model by using REST API calls from your local machine or other applications.
164+
165+
### Set up the environment
166+
167+
1. Open your local command line or terminal.
168+
169+
1. Copy your container app URL from the Azure portal.
170+
171+
1. Set the OLLAMA_URL environment variable:
172+
173+
```bash
174+
export OLLAMA_URL="{Your application URL}"
175+
```
176+
177+
### Make API calls
178+
179+
Use the following curl command to prompt the gpt-oss model:
180+
181+
```bash
182+
curl -X POST "$OLLAMA_URL/api/generate" -H "Content-Type: application/json" -d '{
183+
"model": "gpt-oss:120b",
184+
"prompt": "Can you explain LLMs and recent developments in AI the last few years?",
185+
"stream": false
186+
}'
187+
```
188+
189+
This curl request has streaming set to false, so it returns the fully generated response.
190+
191+
## Clean up resources
192+
193+
To avoid incurring charges on your Azure subscription, clean up the resources you created in this article.
194+
195+
1. In the Azure portal, go to your resource group.
196+
1. Select **Delete resource group**.
197+
1. Enter your resource group name to confirm deletion.
198+
1. Select **Delete**.
199+
200+
## Next steps
201+
202+
Now that you successfully deployed a gpt-oss model, consider these next steps:
203+
204+
- **Add persistent storage**: Azure Container Apps is fully ephemeral and doesn't feature mounted storage by default. To persist your data and conversations, [add a volume mount to your container app](storage-mounts.md).
205+
- **Explore other models**: Follow these same steps to run any model available in [Ollama's library](https://ollama.com/search).
206+
- **Learn more about serverless GPUs**: Review the [Azure Container Apps serverless GPU documentation](gpu-serverless-overview.md) for advanced configuration options.
207+
208+
## Related content
209+
210+
- [Azure Container Apps serverless GPU overview](gpu-serverless-overview.md)
211+
- [Storage mounts in Azure Container Apps](storage-mounts.md)
212+
- [Scale rules in Azure Container Apps](scale-app.md)

0 commit comments

Comments
 (0)