You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/quickstarts/multimodal-vision.md
+31-27Lines changed: 31 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,9 +18,9 @@ Use this article to get started using [Azure AI Studio](https://ai.azure.com) to
18
18
19
19
GPT-4 Turbo with Vision and [Azure AI Vision](../../ai-services/computer-vision/overview.md) offer advanced functionality including:
20
20
21
-
- Optical character recognition (OCR): Extracts text from images and combines it with the user's prompt and image to expand the context.
22
-
- Object visualization: Complements the GPT-4 Turbo with Vision text response with object grounding and outlines salient objects in the input images.
23
-
- Video chat: GPT-4 Turbo with Vision can answer questions by retrieving the video frames most relevant to the user's prompt.
21
+
- Optical Character Recognition (OCR): Extracts text from images and combines it with the user's prompt and image to expand the context.
22
+
- Object grounding: Complements the GPT-4 Turbo with Vision text response with object grounding and outlines salient objects in the input images.
23
+
- Video prompts: GPT-4 Turbo with Vision can answer questions by retrieving the video frames most relevant to the user's prompt.
24
24
25
25
Extra usage fees might apply for using GPT-4 Turbo with Vision and Azure AI Vision functionality.
26
26
@@ -32,8 +32,8 @@ Extra usage fees might apply for using GPT-4 Turbo with Vision and Azure AI Visi
32
32
Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at <ahref="https://aka.ms/oai/access"target="_blank">https://aka.ms/oai/access</a>. Open an issue on this repo to contact us if you have an issue.
33
33
34
34
You need:
35
-
- An [Azure OpenAI resource](https://portal.azure.com/?microsoft_azure_marketplace_ItemHideKey=microsoft_openai_tip#create/Microsoft.CognitiveServicesOpenAI) with the GPT-4 Turbo with Vision models deployed in one of the regions that support GPT-4 Turbo with Vision: Australia East, Switzerland North, Sweden Central, West US, and East US.
36
-
- For enhanced image and video chat, you also need an [Azure AI Vision resource](https://portal.azure.com/#create/Microsoft.CognitiveServicesComputerVision) in one of the regions that support GPT-4 Turbo with Vision: Australia East, Switzerland North, Sweden Central, West US, and East US.
35
+
- An [Azure OpenAI resource](https://portal.azure.com/?microsoft_azure_marketplace_ItemHideKey=microsoft_openai_tip#create/Microsoft.CognitiveServicesOpenAI) with the GPT-4 Turbo with Vision models deployed in one of the regions that support GPT-4 Turbo with Vision: Australia East, Switzerland North, Sweden Central, and West US.
36
+
- For enhanced image and video prompts, you also need an [Azure AI Vision resource](https://portal.azure.com/#create/Microsoft.CognitiveServicesComputerVision) in one of the regions that support GPT-4 Turbo with Vision: Australia East, Switzerland North, Sweden Central, and West US.
37
37
38
38
## Start a chat session to analyze images or video
39
39
@@ -43,7 +43,7 @@ You need an image to complete the image quickstarts. You can use the following i
43
43
44
44
You need a video up to three minutes in length to complete the video quickstart.
45
45
46
-
# [Image analysis chat](#tab/image-chat)
46
+
# [Image prompts](#tab/image-chat)
47
47
48
48
In this chat session, you instruct the assistant to aid in understanding images that you input.
49
49
@@ -76,22 +76,19 @@ In this chat session, you instruct the assistant to aid in understanding images
76
76
77
77
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-accident-reply-insurance-long.png" alt-text="Screenshot of the chat playground with the assistant's follow-up reply for basic image analysis." lightbox="../media/quickstarts/multimodal-vision/chat-car-accident-reply-insurance-long.png":::
In this chat session, you instruct the assistant to aid in understanding images that you input. Try out the capabilities of the augmented vision model.
82
82
83
83
1. Sign in to [Azure AI Studio](https://ai.azure.com).
84
84
1. Select **Build** from the top menu and then select **Playground** from the collapsible left menu.
85
-
1. In the **Configuration** pane on the right side of the chat experience, turn on the option for **Vision** under **Enhancements**.
85
+
1.Select your deployed GPT-4 Turbo with Vision model from the **Deployment** dropdown. In the **Configuration** pane on the right side of the chat experience, turn on the option for **Vision** under **Enhancements**.
86
86
87
-
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-accident-enhanced-settings.png" alt-text="Screenshot of the vision enhancement settings in the chat playground." lightbox="../media/quickstarts/multimodal-vision/chat-car-accident-enhanced-settings.png":::
87
+
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-video-turn-on-enhancement.png" alt-text="Screenshot of the vision enhancement settings in the chat playground." lightbox="../media/quickstarts/multimodal-vision/chat-video-turn-on-enhancement.png":::
88
88
89
-
> [!NOTE]
90
-
> You might need to select **Vision** enhancement button again to apply the changes.
89
+
1. Make sure that **Chat** is selected from the **Mode** dropdown. Under the chat session text box, you should see the option to select a file.
91
90
92
-
1. Make sure that **Chat** is selected from the **Mode** dropdown. Select your deployed GPT-4 Turbo with Vision model from the **Deployment** dropdown. Under the chat session text box, you should now see the option to select a file.
93
-
94
-
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-multi-modal-image-select.png" alt-text="Screenshot of the chat playground with mode and deployment and upload image button highlighted." lightbox="../media/quickstarts/multimodal-vision/chat-multi-modal-image-select.png":::
91
+
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-image-upload-image.png" alt-text="Screenshot of the chat playground with mode and deployment and upload image button highlighted." lightbox="../media/quickstarts/multimodal-vision/chat-image-upload-image.png":::
95
92
96
93
1. In the **System message** text box on the **Assistant setup** pane, provide this prompt to guide the assistant: "You're an AI assistant that helps people find information." You can tailor the prompt the image or scenario that you're uploading.
97
94
1. Select **Apply changes** to save your changes, and when prompted to see if you want to update the system message, select **Continue**.
@@ -106,30 +103,29 @@ In this chat session, you instruct the assistant to aid in understanding images
106
103
107
104
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-accident-prompt-stop.png" alt-text="Screenshot of the chat playground with the square stop button visible." lightbox="../media/quickstarts/multimodal-vision/chat-car-accident-prompt-stop.png":::
108
105
109
-
1. The assistant should reply with a description of the image.
106
+
1. The assistant should reply with a description of the image with objects highlighted both in the text and in the image.
107
+
108
+
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-image-upload-image.png" alt-text="Screenshot of the chat playground with the model output where objects are highlighted in the text and image." lightbox="../media/quickstarts/multimodal-vision/chat-image-upload-image.png":::
109
+
110
110
1. Ask a follow-up question related to the analysis of your image. Enter "What should I highlight about this image to my insurance company" and then select the right arrow icon to send.
111
111
1. You should receive a relevant response similar to what's shown here:
112
112
113
113
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-accident-follow-up-reply.png" alt-text="Screenshot of the chat playground with the assistant's follow-up reply for enhanced image analysis." lightbox="../media/quickstarts/multimodal-vision/chat-car-accident-follow-up-reply.png":::
114
114
115
115
116
-
# [Video analysis chat](#tab/video-chat)
116
+
# [Video prompt enhancements](#tab/video-chat)
117
117
118
118
In this chat session, you'll be instructing the assistant to aid in understanding videos that you input. The assistant extracts several frames from the video and uses them to answer your questions.
119
119
120
120
1. Sign in to [Azure AI Studio](https://ai.azure.com).
121
121
1. Select **Build** from the top menu and then select **Playground** from the collapsible left menu.
122
-
1. In the **Configuration** pane on the right side of the chat experience, turn on the option for **Vision** under **Enhancements**.
123
-
1. In the **Vision enhancements settings** dialog, select an Azure AI Vision resource and then select **Save**.
122
+
1. Select your deployed GPT-4 Turbo with Vision model from the **Deployment** dropdown. In the **Configuration** pane on the right side of the chat experience, turn on the option for **Vision** under **Enhancements**.
124
123
125
-
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-accident-enhanced-settings.png" alt-text="Screenshot of the vision enhancement settings for video analysis in the chat playground." lightbox="../media/quickstarts/multimodal-vision/chat-car-accident-enhanced-settings.png":::
124
+
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-video-turn-on-enhancement.png" alt-text="Screenshot of the vision enhancement settings for video analysis in the chat playground." lightbox="../media/quickstarts/multimodal-vision/chat-video-turn-on-enhancement.png":::
126
125
127
-
> [!NOTE]
128
-
> You might need to select **Vision** enhancement button again to apply the changes.
126
+
1. Make sure that **Chat** is selected from the **Mode** dropdown. Under the chat session text box, you should now see the option to select a video file.
129
127
130
-
1. Make sure that **Chat** is selected from the **Mode** dropdown. Select your deployed GPT-4 Turbo with Vision model from the **Deployment** dropdown. Under the chat session text box, you should now see the option to select a file.
131
-
132
-
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-multi-modal-video-select.png" alt-text="Screenshot of the chat playground with mode and deployment and upload video button highlighted." lightbox="../media/quickstarts/multimodal-vision/chat-multi-modal-video-select.png":::
128
+
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-video-upload-video.png" alt-text="Screenshot of the chat playground with mode and deployment and upload video button highlighted." lightbox="../media/quickstarts/multimodal-vision/chat-video-upload-video.png":::
133
129
134
130
1. In the **System message** text box on the **Assistant setup** pane, provide this prompt to guide the assistant: "You're a car insurance and accident expert. Extract detailed information about the car's make, model, damage extent, license plate, airbag deployment status, mileage, and any other observations" You can tailor the prompt for the video or scenario.
135
131
1. Select **Apply changes** to save your changes, and when prompted to see if you want to update the system message, select **Continue**.
@@ -141,18 +137,26 @@ In this chat session, you'll be instructing the assistant to aid in understandin
141
137
142
138
1. The square icon replaces the right arrow icon. If you select the square icon, the assistant stops processing your request. For this quickstart, let the assistant finish its reply. Don't select the square icon.
143
139
144
-
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-accident-prompt-stop.png" alt-text="Screenshot of the chat playground with the stop button highlighted." lightbox="../media/quickstarts/multimodal-vision/chat-car-accident-prompt-stop.png":::
140
+
:::image type="content" source="../media/quickstarts/multimodal-vision/check-box.png" alt-text="Screenshot of the chat playground with the stop button highlighted." lightbox="../media/quickstarts/multimodal-vision/check-box.png":::
145
141
146
142
1. The assistant should reply with a description of the video.
147
143
148
-
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-video-car-reply.png" alt-text="Screenshot of the chat playground with the assistant's reply for video analysis." lightbox="../media/quickstarts/multimodal-vision/chat-video-car-reply.png":::
144
+
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-video-output.png" alt-text="Screenshot of the chat playground with the assistant's reply for video analysis." lightbox="../media/quickstarts/multimodal-vision/chat-video-output.png":::
149
145
150
146
1. Ask a follow-up question related to the analysis of your video. Enter "What should I highlight about this video to my insurance company" and then select the right arrow icon to send.
151
147
152
148
153
149
1. You should receive a relevant response similar to what's shown here:
154
150
155
151
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-video-reply-insurance.png" alt-text="Screenshot of the chat playground with the assistant's follow-up reply for video analysis." lightbox="../media/quickstarts/multimodal-vision/chat-car-video-reply-insurance.png":::
152
+
153
+
Below are the known limitations of the video prompt enhancements.
154
+
155
+
-**Low resolution:** The frames are analyzed using GPT-4 Turbo with Vision's "low resolution" setting, which may affect the accuracy of small object and text recognition in the video.
156
+
-**Video file limits:** Both mp4 and mov file types are supported. In the Azure AI Playground, videos must be less than 3 minutes long. When using the API there is no such limitation.
157
+
-**Prompt limits:** Video prompts only contain one video and no images. In Playground, you can clear the session to try with another video or images.
158
+
-**Limited frame selection:** Currently the system selects 20 frames from the entire video, which might not capture all critical moments or details. Frame selection can either be approximately evenly spread through the video or focused by a specific a Video Retrieval query, depending on the prompt.
159
+
-**Language support:** Currently, the system primarily supports English for grounding with transcripts. Transcripts don't provide accurate information on lyrics from songs.
156
160
157
161
---
158
162
@@ -173,7 +177,7 @@ At any point in the chat session, you can select the **Show raw JSON** option to
173
177
]
174
178
```
175
179
176
-
This has been a walkthrough of GPT-4 Turbo with Vision in the Azure AI Studio chat playground experience.
180
+
This has been a walkthrough of GPT-4 Turbo with Vision in the Azure AI Studio chat playground experience.
0 commit comments