Skip to content

Commit e479d66

Browse files
authored
Merge pull request #53944 from GraemeMalcolm/main
Updated gen-ai image module
2 parents 70df3ce + 816f928 commit e479d66

12 files changed

Lines changed: 107 additions & 99 deletions

learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/1-introduction.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Introduction
44
metadata:
55
title: Introduction
66
description: "Get started with vision-enabled generative AI models."
7-
ms.date: 04/29/2025
7+
ms.date: 03/23/2026
88
author: gmalc
99
ms.author: gmalc
1010
ms.topic: unit

learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/2-deploy-multimodal-model.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
### YamlMime:ModuleUnit
22
uid: learn.wwl.develop-generative-ai-vision-apps.deploy-multimodal-models
3-
title: Deploy a multimodal model
3+
title: Use a vision-capable model in the Microsoft Foundry portal
44
metadata:
5-
title: Deploy a multimodal model
6-
description: "Deploy a multimodal model that can respond to image-based prompts."
7-
ms.date: 04/29/2025
5+
title: Use a vision-capable model in the Microsoft Foundry portal
6+
description: "Learn how to use a vision-capable model in the Microsoft Foundry portal."
7+
ms.date: 03/23/2026
88
author: gmalc
99
ms.author: gmalc
1010
ms.topic: unit

learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/3-develop-visual-chat-app.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,11 @@ uid: learn.wwl.develop-generative-ai-vision-apps.develop-visual-chat-apps
33
title: Develop a vision-based chat app
44
metadata:
55
title: Develop a vision-based chat app
6-
description: "Use Microsoft Foundry, Azure AI Model Inference, and Azure OpenAI SDKs to develop a vision-based chat app."
7-
ms.date: 04/29/2025
6+
description: "Use Microsoft Foundry and OpenAI APIs to develop a vision-based chat app."
7+
ms.date: 03/23/2026
88
author: gmalc
99
ms.author: gmalc
1010
ms.topic: unit
1111
durationInMinutes: 5
1212
content: |
1313
[!include[](includes/3-develop-visual-chat-app.md)]
14-

learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/4-exercise.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ uid: learn.wwl.develop-generative-ai-vision-apps.exercise
33
title: Exercise - Develop a vision-enabled chat app
44
metadata:
55
title: Exercise - Develop a vision-enabled chat app
6-
description: "Get practical experience of deploying a multimodal model and creating a vision-enabled chat app."
7-
ms.date: 04/29/2025
6+
description: "Get practical experience of creating a vision-enabled chat app."
7+
ms.date: 03/23/2026
88
author: gmalc
99
ms.author: gmalc
1010
ms.topic: unit

learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/5-knowledge-check.yml

Lines changed: 34 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -4,45 +4,44 @@ title: Module assessment
44
metadata:
55
title: Module assessment
66
description: "Check your learning on vision-enabled generative AI."
7-
ms.date: 04/29/2025
7+
ms.date: 03/23/2026
88
author: gmalc
99
ms.author: gmalc
1010
ms.topic: unit
1111
durationInMinutes: 3
1212
content: |
1313
quiz:
1414
questions:
15-
- content: "Which kind of model can you use to respond to visual input?"
16-
choices:
17-
- content: "Only OpenAI GPT models"
18-
isCorrect: false
19-
explanation: "Incorrect."
20-
- content: "Embedding models"
21-
isCorrect: false
22-
explanation: Incorrect."
23-
- content: "Multimodal models"
24-
isCorrect: true
25-
explanation: "Correct."
26-
- content: "How can you submit a prompt that asks a model to analyze an image?"
27-
choices:
28-
- content: "Submit one prompt with an image-based message followed by another prompt with a text-based message."
29-
isCorrect: false
30-
explanation: "Incorrect."
31-
- content: "Submit a prompt that contains a multi-part user message, containing both text content and image content."
32-
isCorrect: true
33-
explanation: "Correct."
34-
- content: "Submit the image as the system message and the instruction or question as the user message."
35-
isCorrect: false
36-
explanation: "Incorrect."
37-
- content: "How can you include an image in a message?"
38-
choices:
39-
- content: "As a URL or as binary data"
40-
isCorrect: true
41-
explanation: "Correct."
42-
- content: "Only as a URL"
43-
isCorrect: false
44-
explanation: "Incorrect."
45-
- content: "Only as binary data"
46-
isCorrect: false
47-
explanation: "Incorrect."
48-
15+
- content: "Which kind of model can you use to respond to visual input?"
16+
choices:
17+
- content: "Only OpenAI GPT models"
18+
isCorrect: false
19+
explanation: "Incorrect."
20+
- content: "Embedding models"
21+
isCorrect: false
22+
explanation: Incorrect."
23+
- content: "Multimodal models"
24+
isCorrect: true
25+
explanation: "Correct."
26+
- content: "How can you submit a prompt that asks a model to analyze an image?"
27+
choices:
28+
- content: "Submit one prompt with an image-based message followed by another prompt with a text-based message."
29+
isCorrect: false
30+
explanation: "Incorrect."
31+
- content: "Submit a prompt that contains a multi-part user message, containing both text content and image content."
32+
isCorrect: true
33+
explanation: "Correct."
34+
- content: "Submit the image as the system message and the instruction or question as the user message."
35+
isCorrect: false
36+
explanation: "Incorrect."
37+
- content: "How can you include an image in a message?"
38+
choices:
39+
- content: "As a URL or as binary data"
40+
isCorrect: true
41+
explanation: "Correct."
42+
- content: "Only as a URL"
43+
isCorrect: false
44+
explanation: "Incorrect."
45+
- content: "Only as binary data"
46+
isCorrect: false
47+
explanation: "Incorrect."

learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/6-summary.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Summary
44
metadata:
55
title: Summary
66
description: "Reflect on what you've learned about vision-enabled generative AI models."
7-
ms.date: 04/29/2025
7+
ms.date: 03/23/2026
88
author: gmalc
99
ms.author: gmalc
1010
ms.topic: unit
Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
To handle prompts that include images, you need to deploy a *multimodal* generative AI model - in other words, a model that supports not only text-based input, but image-based (and in some cases, audio-based) input as well. Multimodal models available in Microsoft Foundry include (among others):
22

33
- Microsoft **Phi-4-multimodal-instruct**
4-
- OpenAI **gpt-4o**
5-
- OpenAI **gpt-4o-mini**
6-
4+
- OpenAI **gpt-4.1**
5+
- OpenAI **gpt-4.1-mini**
76

87
> [!TIP]
9-
> To learn more about available models in Microsoft Foundry, see the **[Model catalog and collections in Microsoft Foundry portal](/azure/ai-foundry/how-to/model-catalog-overview)** article in the Microsoft Foundry documentation.
8+
> To learn more about available models in Microsoft Foundry, see the **[Microsoft Foundry Models overview](/azure/foundry/concepts/foundry-models-overview)** article in the Microsoft Foundry documentation.
109
1110
## Testing multimodal models with image-based prompts
1211

@@ -15,4 +14,3 @@ After deploying a multimodal model, you can test it in the chat playground in Mi
1514
![Screenshot of the chat playground with an image-based prompt.](../media/image-prompt.png)
1615

1716
In the chat playground, you can upload an image from a local file and add text to the message to elicit a response from a multimodal model.
18-
Lines changed: 47 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,62 @@
11
To develop a client app that engages in vision-based chats with a multimodal model, you can use the same basic techniques used for text-based chats. You require a connection to the endpoint where the model is deployed, and you use that endpoint to submit prompts that consists of messages to the model and process the responses.
22

3-
The key difference is that prompts for a vision-based chat include multi-part user messages that contain both a *text* (or *audio* where supported) content item and an *image* content item.
3+
The key difference is that prompts for a vision-based chat include multi-part user messages that contain both a *text* content item and an *image* content item.
44

55
![Diagram of a multi-part prompt being submitted to a model.](../media/multi-part-prompt.png)
66

7-
The JSON representation of a prompt that includes a multi-part user message looks something like this:
7+
## Submit an image-based prompt using the *Responses* API
88

9-
```json
10-
{
11-
"messages": [
12-
{ "role": "system", "content": "You are a helpful assistant." },
13-
{ "role": "user", "content": [
14-
{
15-
"type": "text",
16-
"text": "Describe this picture:"
17-
},
18-
{
19-
"type": "image_url",
20-
"image_url": {
21-
"url": "https://....."
22-
}
23-
}
9+
To include an image in a prompt using the *Responses* API, specify a URL for a web-based image file, or load a local image and encode its data in Base64 format and submit a URL in the format `data:image/jpeg;base64,{image_data}` (replacing "jpeg" with "png" pr other formats as appropriate).
10+
11+
The following Python example shows how to submit an image in a prompt using the *Responses* API:
12+
13+
```python
14+
# Read the image data from a local file
15+
image_path = Path("dragon-fruit.jpeg")
16+
image_format = "jpeg"
17+
with open(image_path, "rb") as image_file:
18+
image_data = base64.b64encode(image_file.read()).decode("utf-8")
19+
20+
data_url = f"data:image/{image_format};base64,{image_data}" # You can also use a web URL
21+
22+
# Send the image data in a prompt to the model
23+
response = client.responses.create(
24+
model="gpt-4.1",
25+
input=[
26+
{"role": "developer", "content": "You are an AI assistant for chefs planning recipes."},
27+
{"role": "user", "content": [
28+
{ "type": "input_text", "text": "What desserts could I make with this?"},
29+
{ "type": "input_image", "image_url": data_url}
2430
] }
2531
]
26-
}
32+
)
33+
print(response.output_text)
2734
```
2835

29-
The image content item can be:
36+
## Submit an image-based prompt using the *ChatCompletions* API
3037

31-
- A URL to an image file in a web site.
32-
- Binary image data
38+
When using the Azure OpenAI endpoint to submit prompts to models that don't support the *Responses* API, you can use the *CatCompletions* API; like this:
3339

34-
When using binary data to submit a local image file, the **image_url** content takes the form of a base64 encoded value in a data URL format:
40+
```python
41+
# Read the image data from a local file
42+
image_path = Path("orange.jpeg")
43+
image_format = "jpeg"
44+
with open(image_path, "rb") as image_file:
45+
image_data = base64.b64encode(image_file.read()).decode("utf-8")
3546

36-
```json
37-
{
38-
"type": "image_url",
39-
"image_url": {
40-
"url": "data:image/jpeg;base64,<binary_image_data>"
41-
}
42-
}
43-
```
47+
data_url = f"data:image/{image_format};base64,{image_data}" # You can also use a web URL
4448

45-
Depending on the model type, and where you deployed it, you can use Microsoft Azure AI Model Inference or OpenAI APIs to submit vision-based prompts. These libraries also provide language-specific SDKs that abstract the underlying REST APIs.
49+
# Send the image data in a prompt to the model
50+
response = client.chat.completions.create(
51+
model="Phi-4-multimodal-instruct",
52+
messages=[
53+
{"role": "system", "content": "You are an AI assistant for chefs planning recipes."},
54+
{ "role": "user", "content": [
55+
{ "type": "text", "text": "What can I make with this fruit?"},
56+
{ "type": "image_url", "image_url": {"url": data_url}}
57+
] }
58+
]
59+
)
60+
print(response.choices[0].message.content)
4661

47-
In the exercise that follows in this module, you can use the Python or .NET SDK for the Azure AI Model Inference API and the OpenAI API to develop a vision-enabled chat application.
62+
```

learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/includes/4-exercise.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ If you have an Azure subscription, you can complete this exercise to develop a v
55
66
Launch the exercise and follow the instructions.
77

8-
[![Button to launch exercise.](../media/launch-exercise.png)](https://go.microsoft.com/fwlink/?linkid=2356207&azure-portal=true)
8+
[![Button to launch exercise.](../media/launch-exercise.png)](https://go.microsoft.com/fwlink/?linkid=2356866&azure-portal=true)

learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/includes/6-summary.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,4 @@ In this module, you learned about vision-enabled generative AI models and how to
33
Vision-enabled models let you create AI solutions that can understand images and respond to related questions or instructions. Beyond just identifying objects in pictures, some models can also use reasoning based on what they see. For instance, they can interpret a chart or assess if an object is damaged.
44

55
> [!TIP]
6-
> For more information about working with multimodal models in Microsoft Foundry, see **[How to use image and audio in chat completions with Azure AI model inference](/azure/ai-foundry/model-inference/how-to/use-chat-multi-modal)** and **[Quickstart: Use images in your AI chats](/azure/ai-services/openai/gpt-v-quickstart)**.
7-
6+
> For more information about analyzing images with the OpenAI Responses API, see, see **[Images and vision](https://developers.openai.com/api/docs/guides/images-vision?format=url#analyze-images?azure-portal=true)** in the OpenAI developer guide.

0 commit comments

Comments
 (0)