Skip to content

Commit bec4c55

Browse files
updates
1 parent f462a48 commit bec4c55

1 file changed

Lines changed: 55 additions & 43 deletions

File tree

articles/container-apps/deploy-openai-gpt-oss-ollama.md

Lines changed: 55 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -8,52 +8,55 @@ ms.reviewer: cshoe
88
ms.service: azure-container-apps
99
ms.collection: ce-skilling-ai-copilot
1010
ms.topic: tutorial
11-
ms.date: 12/11/2025
11+
ms.date: 12/12/2025
1212
---
1313

1414
# Deploy OpenAI gpt-oss models with Ollama on Azure Container Apps serverless GPUs
1515

16-
OpenAI recently announced the release of [gpt-oss-120b and gpt-oss-20b](https://openai.com/index/introducing-gpt-oss/), two new state-of-the-art open-weight language models designed to run on lighter weight GPU resources. These models make powerful language capabilities highly accessible for developers who want to self-host language models within their own environments.
16+
OpenAI recently announced the release of [gpt-oss-120b and gpt-oss-20b](https://openai.com/index/introducing-gpt-oss/), two new open-weight language models designed to run on lighter weight GPU resources. These models make powerful language capabilities highly accessible for developers who want to self-host language models within their own environments.
1717

1818
This article shows you how to deploy these models by using [Azure Container Apps serverless GPUs](./gpu-serverless-overview.md) with Ollama, providing a cost-efficient and scalable platform with minimal infrastructure overhead.
1919

20-
## Learning objectives
20+
By the end of this article, you can:
2121

22-
By the end of this article, you'll be able to:
23-
24-
- Use Azure Container Apps serverless GPUs for AI workloads
25-
- Choose the right gpt-oss model for your needs
26-
- Deploy an Ollama container on Azure Container Apps with GPU support
27-
- Configure and interact with deployed models
28-
- Call model APIs from external applications
22+
> [!div class="checklist"]
23+
> * Use Azure Container Apps serverless GPUs for AI workloads
24+
> * Choose the right gpt-oss model for your needs
25+
> * Deploy an Ollama container on Azure Container Apps with GPU support
26+
> * Configure and interact with deployed models
27+
> * Call model APIs from external applications
2928
3029
## Prerequisites
3130

32-
- An Azure subscription. If you don't have one, [create a free account](https://azure.microsoft.com/pricing/purchase-options/azure-account?cid=msft_learn).
33-
- Quota for serverless GPUs in Azure Container Apps. If you don't have quota, [request a GPU quota](gpu-serverless-overview.md#request-serverless-gpu-quota).
34-
- Basic understanding of containers and Azure services
35-
- Familiarity with command-line interface
31+
* **An Azure subscription**: If you don't have one, [create a free account](https://azure.microsoft.com/pricing/purchase-options/azure-account?cid=msft_learn).
32+
* **Quota for serverless GPUs**: If you don't have quota, [request a GPU quota](gpu-serverless-overview.md#request-serverless-gpu-quota).
3633

3734
## What are Azure Container Apps serverless GPUs?
3835

3936
Azure Container Apps is a fully managed, serverless container platform that simplifies the deployment and operation of containerized applications. By using serverless GPU support, you can bring your own containers and deploy them to GPU-backed environments that automatically scale based on demand.
4037

41-
### Key benefits
38+
### Benefits of using serverless GPUs
39+
40+
Azure Container Apps serverless GPUs provide the following advantages for deploying AI models:
41+
42+
* **Autoscaling**: Scale to zero when idle, scale out based on demand.
43+
44+
* **Pay-per-second billing**: Pay only for the compute you use.
4245

43-
- **Autoscaling**: Scale to zero when idle, scale out based on demand.
44-
- **Pay-per-second billing**: Pay only for the compute you use.
45-
- **Ease of use**: Accelerate developer velocity and easily bring any container to run on GPUs in the cloud.
46-
- **No infrastructure management**: Focus on your model and application.
47-
- **Enterprise-grade features**: Built-in support for virtual networks, managed identity, private endpoints, and full data governance.
46+
* **Ease of use**: Accelerate developer velocity and easily bring any container to run on GPUs in the cloud.
47+
48+
* **No infrastructure management**: Focus on your model and application.
49+
50+
* **Enterprise-grade features**: Built-in support for virtual networks, managed identity, private endpoints, and full data governance.
4851

4952
## Choose the right gpt-oss model
5053

5154
The [gpt-oss models](https://openai.com/index/introducing-gpt-oss/) deliver strong performance across common language benchmarks and are optimized for different use cases:
5255

5356
| Model | Performance | Use cases | Recommended GPU |
5457
|-------|-------------|-----------|-----------------|
55-
| gpt-oss-120b | Comparable to OpenAI's gpt-4o-mini | High-performance reasoning workloads | A100 |
56-
| gpt-oss-20b | Comparable to gpt-o3-mini | Lightweight applications, cost-effective small language model (SLM) apps | T4 or A100 |
58+
| `gpt-oss-120b` | Comparable to OpenAI's gpt-4o-mini | High-performance reasoning workloads | A100 |
59+
| `gpt-oss-20b` | Comparable to gpt-o3-mini | Lightweight applications, cost-effective small language model (SLM) apps | T4 or A100 |
5760

5861
### Regional availability
5962

@@ -83,8 +86,9 @@ Choose your deployment region based on the model you want to use and GPU availab
8386
1. Select **Container App** and then select **Create**.
8487

8588
1. On the **Basics** tab, configure the following settings:
86-
- Keep most default values.
87-
- For **Region**, select a region that supports your chosen model based on the regional availability table.
89+
90+
* Keep most default values.
91+
* For **Region**, select a region that supports your chosen model based on the regional availability table.
8892

8993
### Step 2: Configure container settings
9094

@@ -94,19 +98,21 @@ Choose your deployment region based on the model you want to use and GPU availab
9498

9599
| Field | Value |
96100
| --- | --- |
97-
| **Image source** | Docker Hub or other registries |
98-
| **Image type** | Public |
101+
| **Image source** | Select **Docker Hub or other registries** |
102+
| **Image type** | Select **Public** |
99103
| **Registry login server** | docker.io |
100-
| **Image and tag** | ollama/ollama:latest |
101-
| **Workload profile** | Consumption |
102-
| **GPU** | ✅ (check the box) |
103-
| **GPU type** | A100 for gpt-oss:120b<br>T4 or A100 for gpt-oss:20b |
104+
| **Image and tag** | Enter **ollama/ollama:latest** |
105+
| **Workload profile** | Select **Consumption** |
106+
| **GPU** | Select the **GPU** box |
107+
| **GPU type** | Select **A100** for gpt-oss:120b, select **T4**, or **A100** for gpt-oss:20b |
104108

105109
> [!IMPORTANT]
106-
> By default, pay-as-you-go and EA customers have quota. If you don't have quota for serverless GPUs in Azure Container Apps, [request a GPU quota](gpu-serverless-overview.md#request-serverless-gpu-quota).
110+
> By default, pay-as-you-go and enterprise agreement customers have quota. If you don't have quota for serverless GPUs in Azure Container Apps, [request a GPU quota](gpu-serverless-overview.md#request-serverless-gpu-quota).
107111
108112
### Step 3: Configure ingress
109113

114+
Configure ingress to allow external access to your Ollama container and enable API calls to your deployed models.
115+
110116
1. Select the **Ingress** tab.
111117

112118
1. Configure the following settings:
@@ -123,6 +129,8 @@ Choose your deployment region based on the model you want to use and GPU availab
123129

124130
## Deploy and use your gpt-oss model
125131

132+
After creating your container app with GPU support and ingress, you're ready to pull and run the gpt-oss model.
133+
126134
### Step 1: Access your deployed application
127135

128136
1. Once your deployment is complete, select **Go to resource**.
@@ -132,7 +140,7 @@ Choose your deployment region based on the model you want to use and GPU availab
132140
### Step 2: Pull and run the model
133141

134142
> [!TIP]
135-
> Console commands in the container app aren't counted as traffic for the container app to stay scaled out, so your application might scale back in after a set period. If you want the container app to remain active for a longer duration, go to **Application** > **Scaling** and set the minimum replica count to 1 or increase the cooldown period duration. Remember to reset the minimum replica count to 0 when not in use to avoid ongoing billing.
143+
> Console commands in the container app aren't counted as traffic for the container app to stay scaled out, so your application might scale back after a set period. If you want the container app to remain active for a longer duration, go to **Application** > **Scaling** and set the minimum replica count to 1 or increase the cooldown period duration. Remember to reset the minimum replica count to 0 when not in use to avoid ongoing billing.
136144
137145
1. In the Azure portal, select the **Monitoring** dropdown, and then select **Console**.
138146

@@ -152,7 +160,7 @@ Choose your deployment region based on the model you want to use and GPU availab
152160

153161
1. Test the model with a sample prompt:
154162

155-
```
163+
```text
156164
Can you explain LLMs and recent developments in AI the last few years?
157165
```
158166

@@ -170,8 +178,10 @@ You can interact with your deployed model by using REST API calls from your loca
170178

171179
1. Set the OLLAMA_URL environment variable:
172180

181+
Make sure to replace the placeholder surrounded by `<>` with your value before running the following command.
182+
173183
```bash
174-
export OLLAMA_URL="{Your application URL}"
184+
export OLLAMA_URL="<YOUR_APPLICATION_URL>"
175185
```
176186

177187
### Make API calls
@@ -190,23 +200,25 @@ This curl request has streaming set to false, so it returns the fully generated
190200

191201
## Clean up resources
192202

193-
To avoid incurring charges on your Azure subscription, clean up the resources you created in this article.
203+
To avoid charges on your Azure subscription, clean up the resources you created in this article.
194204

195205
1. In the Azure portal, go to your resource group.
196206
1. Select **Delete resource group**.
197-
1. Enter your resource group name to confirm deletion.
207+
1. To confirm the delete operation, enter your resource group name.
198208
1. Select **Delete**.
199209

200210
## Next steps
201211

202-
Now that you successfully deployed a gpt-oss model, consider these next steps:
212+
Now that you successfully deployed a gpt-oss model, consider the following ways to further develop your application:
213+
214+
* **Add persistent storage**: Azure Container Apps is fully ephemeral and doesn't feature mounted storage by default. To persist your data and conversations, [add a volume mount to your container app](storage-mounts.md).
215+
216+
* **Explore other models**: Follow these same steps to run any model available in [Ollama's library](https://ollama.com/search).
203217

204-
- **Add persistent storage**: Azure Container Apps is fully ephemeral and doesn't feature mounted storage by default. To persist your data and conversations, [add a volume mount to your container app](storage-mounts.md).
205-
- **Explore other models**: Follow these same steps to run any model available in [Ollama's library](https://ollama.com/search).
206-
- **Learn more about serverless GPUs**: Review the [Azure Container Apps serverless GPU documentation](gpu-serverless-overview.md) for advanced configuration options.
218+
* **Learn more about serverless GPUs**: Review the [Azure Container Apps serverless GPU documentation](gpu-serverless-overview.md) for advanced configuration options.
207219

208220
## Related content
209221

210-
- [Azure Container Apps serverless GPU overview](gpu-serverless-overview.md)
211-
- [Storage mounts in Azure Container Apps](storage-mounts.md)
212-
- [Scale rules in Azure Container Apps](scale-app.md)
222+
* [Azure Container Apps serverless GPU overview](gpu-serverless-overview.md)
223+
* [Storage mounts in Azure Container Apps](storage-mounts.md)
224+
* [Scale rules in Azure Container Apps](scale-app.md)

0 commit comments

Comments
 (0)