Skip to content

Commit e73e89a

Browse files
authored
Merge pull request #30 from kbrowne8/patch-87
per PM input
2 parents b067f20 + 22e0ba9 commit e73e89a

7 files changed

Lines changed: 31 additions & 27 deletions

File tree

537 KB
Loading
1.12 MB
Loading
914 KB
Loading
1.12 MB
Loading
1.12 MB
Loading
823 KB
Loading

articles/ai-studio/quickstarts/multimodal-vision.md

Lines changed: 31 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,9 @@ Use this article to get started using [Azure AI Studio](https://ai.azure.com) to
1818

1919
GPT-4 Turbo with Vision and [Azure AI Vision](../../ai-services/computer-vision/overview.md) offer advanced functionality including:
2020

21-
- Optical character recognition (OCR): Extracts text from images and combines it with the user's prompt and image to expand the context.
22-
- Object visualization: Complements the GPT-4 Turbo with Vision text response with object grounding and outlines salient objects in the input images.
23-
- Video chat: GPT-4 Turbo with Vision can answer questions by retrieving the video frames most relevant to the user's prompt.
21+
- Optical Character Recognition (OCR): Extracts text from images and combines it with the user's prompt and image to expand the context.
22+
- Object grounding: Complements the GPT-4 Turbo with Vision text response with object grounding and outlines salient objects in the input images.
23+
- Video prompts: GPT-4 Turbo with Vision can answer questions by retrieving the video frames most relevant to the user's prompt.
2424

2525
Extra usage fees might apply for using GPT-4 Turbo with Vision and Azure AI Vision functionality.
2626

@@ -32,8 +32,8 @@ Extra usage fees might apply for using GPT-4 Turbo with Vision and Azure AI Visi
3232
Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at <a href="https://aka.ms/oai/access" target="_blank">https://aka.ms/oai/access</a>. Open an issue on this repo to contact us if you have an issue.
3333

3434
You need:
35-
- An [Azure OpenAI resource](https://portal.azure.com/?microsoft_azure_marketplace_ItemHideKey=microsoft_openai_tip#create/Microsoft.CognitiveServicesOpenAI) with the GPT-4 Turbo with Vision models deployed in one of the regions that support GPT-4 Turbo with Vision: Australia East, Switzerland North, Sweden Central, West US, and East US.
36-
- For enhanced image and video chat, you also need an [Azure AI Vision resource](https://portal.azure.com/#create/Microsoft.CognitiveServicesComputerVision) in one of the regions that support GPT-4 Turbo with Vision: Australia East, Switzerland North, Sweden Central, West US, and East US.
35+
- An [Azure OpenAI resource](https://portal.azure.com/?microsoft_azure_marketplace_ItemHideKey=microsoft_openai_tip#create/Microsoft.CognitiveServicesOpenAI) with the GPT-4 Turbo with Vision models deployed in one of the regions that support GPT-4 Turbo with Vision: Australia East, Switzerland North, Sweden Central, and West US.
36+
- For enhanced image and video prompts, you also need an [Azure AI Vision resource](https://portal.azure.com/#create/Microsoft.CognitiveServicesComputerVision) in one of the regions that support GPT-4 Turbo with Vision: Australia East, Switzerland North, Sweden Central, and West US.
3737

3838
## Start a chat session to analyze images or video
3939

@@ -43,7 +43,7 @@ You need an image to complete the image quickstarts. You can use the following i
4343

4444
You need a video up to three minutes in length to complete the video quickstart.
4545

46-
# [Image analysis chat](#tab/image-chat)
46+
# [Image prompts](#tab/image-chat)
4747

4848
In this chat session, you instruct the assistant to aid in understanding images that you input.
4949

@@ -76,22 +76,19 @@ In this chat session, you instruct the assistant to aid in understanding images
7676

7777
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-accident-reply-insurance-long.png" alt-text="Screenshot of the chat playground with the assistant's follow-up reply for basic image analysis." lightbox="../media/quickstarts/multimodal-vision/chat-car-accident-reply-insurance-long.png":::
7878

79-
# [Enhanced image analysis chat](#tab/enhanced-image-chat)
79+
# [Image prompt enhancements](#tab/enhanced-image-chat)
8080

8181
In this chat session, you instruct the assistant to aid in understanding images that you input. Try out the capabilities of the augmented vision model.
8282

8383
1. Sign in to [Azure AI Studio](https://ai.azure.com).
8484
1. Select **Build** from the top menu and then select **Playground** from the collapsible left menu.
85-
1. In the **Configuration** pane on the right side of the chat experience, turn on the option for **Vision** under **Enhancements**.
85+
1. Select your deployed GPT-4 Turbo with Vision model from the **Deployment** dropdown. In the **Configuration** pane on the right side of the chat experience, turn on the option for **Vision** under **Enhancements**.
8686

87-
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-accident-enhanced-settings.png" alt-text="Screenshot of the vision enhancement settings in the chat playground." lightbox="../media/quickstarts/multimodal-vision/chat-car-accident-enhanced-settings.png":::
87+
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-video-turn-on-enhancement.png" alt-text="Screenshot of the vision enhancement settings in the chat playground." lightbox="../media/quickstarts/multimodal-vision/chat-video-turn-on-enhancement.png":::
8888

89-
> [!NOTE]
90-
> You might need to select **Vision** enhancement button again to apply the changes.
89+
1. Make sure that **Chat** is selected from the **Mode** dropdown. Under the chat session text box, you should see the option to select a file.
9190

92-
1. Make sure that **Chat** is selected from the **Mode** dropdown. Select your deployed GPT-4 Turbo with Vision model from the **Deployment** dropdown. Under the chat session text box, you should now see the option to select a file.
93-
94-
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-multi-modal-image-select.png" alt-text="Screenshot of the chat playground with mode and deployment and upload image button highlighted." lightbox="../media/quickstarts/multimodal-vision/chat-multi-modal-image-select.png":::
91+
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-image-upload-image.png" alt-text="Screenshot of the chat playground with mode and deployment and upload image button highlighted." lightbox="../media/quickstarts/multimodal-vision/chat-image-upload-image.png":::
9592

9693
1. In the **System message** text box on the **Assistant setup** pane, provide this prompt to guide the assistant: "You're an AI assistant that helps people find information." You can tailor the prompt the image or scenario that you're uploading.
9794
1. Select **Apply changes** to save your changes, and when prompted to see if you want to update the system message, select **Continue**.
@@ -106,30 +103,29 @@ In this chat session, you instruct the assistant to aid in understanding images
106103

107104
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-accident-prompt-stop.png" alt-text="Screenshot of the chat playground with the square stop button visible." lightbox="../media/quickstarts/multimodal-vision/chat-car-accident-prompt-stop.png":::
108105

109-
1. The assistant should reply with a description of the image.
106+
1. The assistant should reply with a description of the image with objects highlighted both in the text and in the image.
107+
108+
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-image-upload-image.png" alt-text="Screenshot of the chat playground with the model output where objects are highlighted in the text and image." lightbox="../media/quickstarts/multimodal-vision/chat-image-upload-image.png":::
109+
110110
1. Ask a follow-up question related to the analysis of your image. Enter "What should I highlight about this image to my insurance company" and then select the right arrow icon to send.
111111
1. You should receive a relevant response similar to what's shown here:
112112

113113
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-accident-follow-up-reply.png" alt-text="Screenshot of the chat playground with the assistant's follow-up reply for enhanced image analysis." lightbox="../media/quickstarts/multimodal-vision/chat-car-accident-follow-up-reply.png":::
114114

115115

116-
# [Video analysis chat](#tab/video-chat)
116+
# [Video prompt enhancements](#tab/video-chat)
117117

118118
In this chat session, you'll be instructing the assistant to aid in understanding videos that you input. The assistant extracts several frames from the video and uses them to answer your questions.
119119

120120
1. Sign in to [Azure AI Studio](https://ai.azure.com).
121121
1. Select **Build** from the top menu and then select **Playground** from the collapsible left menu.
122-
1. In the **Configuration** pane on the right side of the chat experience, turn on the option for **Vision** under **Enhancements**.
123-
1. In the **Vision enhancements settings** dialog, select an Azure AI Vision resource and then select **Save**.
122+
1. Select your deployed GPT-4 Turbo with Vision model from the **Deployment** dropdown. In the **Configuration** pane on the right side of the chat experience, turn on the option for **Vision** under **Enhancements**.
124123

125-
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-accident-enhanced-settings.png" alt-text="Screenshot of the vision enhancement settings for video analysis in the chat playground." lightbox="../media/quickstarts/multimodal-vision/chat-car-accident-enhanced-settings.png":::
124+
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-video-turn-on-enhancement.png" alt-text="Screenshot of the vision enhancement settings for video analysis in the chat playground." lightbox="../media/quickstarts/multimodal-vision/chat-video-turn-on-enhancement.png":::
126125

127-
> [!NOTE]
128-
> You might need to select **Vision** enhancement button again to apply the changes.
126+
1. Make sure that **Chat** is selected from the **Mode** dropdown. Under the chat session text box, you should now see the option to select a video file.
129127

130-
1. Make sure that **Chat** is selected from the **Mode** dropdown. Select your deployed GPT-4 Turbo with Vision model from the **Deployment** dropdown. Under the chat session text box, you should now see the option to select a file.
131-
132-
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-multi-modal-video-select.png" alt-text="Screenshot of the chat playground with mode and deployment and upload video button highlighted." lightbox="../media/quickstarts/multimodal-vision/chat-multi-modal-video-select.png":::
128+
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-video-upload-video.png" alt-text="Screenshot of the chat playground with mode and deployment and upload video button highlighted." lightbox="../media/quickstarts/multimodal-vision/chat-video-upload-video.png":::
133129

134130
1. In the **System message** text box on the **Assistant setup** pane, provide this prompt to guide the assistant: "You're a car insurance and accident expert. Extract detailed information about the car's make, model, damage extent, license plate, airbag deployment status, mileage, and any other observations" You can tailor the prompt for the video or scenario.
135131
1. Select **Apply changes** to save your changes, and when prompted to see if you want to update the system message, select **Continue**.
@@ -141,18 +137,26 @@ In this chat session, you'll be instructing the assistant to aid in understandin
141137

142138
1. The square icon replaces the right arrow icon. If you select the square icon, the assistant stops processing your request. For this quickstart, let the assistant finish its reply. Don't select the square icon.
143139

144-
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-accident-prompt-stop.png" alt-text="Screenshot of the chat playground with the stop button highlighted." lightbox="../media/quickstarts/multimodal-vision/chat-car-accident-prompt-stop.png":::
140+
:::image type="content" source="../media/quickstarts/multimodal-vision/check-box.png" alt-text="Screenshot of the chat playground with the stop button highlighted." lightbox="../media/quickstarts/multimodal-vision/check-box.png":::
145141

146142
1. The assistant should reply with a description of the video.
147143

148-
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-video-car-reply.png" alt-text="Screenshot of the chat playground with the assistant's reply for video analysis." lightbox="../media/quickstarts/multimodal-vision/chat-video-car-reply.png":::
144+
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-video-output.png" alt-text="Screenshot of the chat playground with the assistant's reply for video analysis." lightbox="../media/quickstarts/multimodal-vision/chat-video-output.png":::
149145

150146
1. Ask a follow-up question related to the analysis of your video. Enter "What should I highlight about this video to my insurance company" and then select the right arrow icon to send.
151147

152148

153149
1. You should receive a relevant response similar to what's shown here:
154150

155151
:::image type="content" source="../media/quickstarts/multimodal-vision/chat-car-video-reply-insurance.png" alt-text="Screenshot of the chat playground with the assistant's follow-up reply for video analysis." lightbox="../media/quickstarts/multimodal-vision/chat-car-video-reply-insurance.png":::
152+
153+
Below are the known limitations of the video prompt enhancements.
154+
155+
- **Low resolution:** The frames are analyzed using GPT-4 Turbo with Vision's "low resolution" setting, which may affect the accuracy of small object and text recognition in the video.
156+
- **Video file limits:** Both mp4 and mov file types are supported. In the Azure AI Playground, videos must be less than 3 minutes long. When using the API there is no such limitation.
157+
- **Prompt limits:** Video prompts only contain one video and no images. In Playground, you can clear the session to try with another video or images.
158+
- **Limited frame selection:** Currently the system selects 20 frames from the entire video, which might not capture all critical moments or details. Frame selection can either be approximately evenly spread through the video or focused by a specific a Video Retrieval query, depending on the prompt.
159+
- **Language support:** Currently, the system primarily supports English for grounding with transcripts. Transcripts don't provide accurate information on lyrics from songs.
156160

157161
---
158162

@@ -173,7 +177,7 @@ At any point in the chat session, you can select the **Show raw JSON** option to
173177
]
174178
```
175179

176-
This has been a walkthrough of GPT-4 Turbo with Vision in the Azure AI Studio chat playground experience.
180+
This has been a walkthrough of GPT-4 Turbo with Vision in the Azure AI Studio chat playground experience.
177181

178182
## Clean up resources
179183

0 commit comments

Comments
 (0)