Update articles/ai-studio/quickstarts/multimodal-vision.md

kbrowne8 · eric-urban · web-flow · commit 22e0ba9d0b8d · 2023-12-10T20:14:26.000-08:00
Co-authored-by: Eric Urban &lt;eric.urban@microsoft.com&gt;
diff --git a/articles/ai-studio/quickstarts/multimodal-vision.md b/articles/ai-studio/quickstarts/multimodal-vision.md
@@ -152,11 +152,11 @@ In this chat session, you'll be instructing the assistant to aid in understandin
 
 Below are the known limitations of the video prompt enhancements.
 
-1. **Low Resolution:** The frames are analyzed using GPT-4 Turbo with Vision's "low resolution" setting, which may affect the accuracy of small object and text recognition in the video.
-2. **Video File Limits:** Both mp4 and mov file types are supported. In the Azure AI Playground, videos must be less than 3 minutes long. When using the API there is no such limitation.
-3. **Prompt Limits:** Video prompts only contain one video and no images. In Playground, you can clear the session to try with another video or images.
-4. **Limited Frame Selection:** Currently the system selects 20 frames from the entire video, which might not capture all critical moments or details. Frame selection can either be approximately evenly spread through the video or focused by a specific a Video Retrieval query, depending on the prompt.
-5. **Language Support:** Currently, the system primarily supports English for grounding with transcripts. Transcripts don't provide accurate information on lyrics from songs.
+- **Low resolution:** The frames are analyzed using GPT-4 Turbo with Vision's "low resolution" setting, which may affect the accuracy of small object and text recognition in the video.
+- **Video file limits:** Both mp4 and mov file types are supported. In the Azure AI Playground, videos must be less than 3 minutes long. When using the API there is no such limitation.
+- **Prompt limits:** Video prompts only contain one video and no images. In Playground, you can clear the session to try with another video or images.
+- **Limited frame selection:** Currently the system selects 20 frames from the entire video, which might not capture all critical moments or details. Frame selection can either be approximately evenly spread through the video or focused by a specific a Video Retrieval query, depending on the prompt.
+- **Language support:** Currently, the system primarily supports English for grounding with transcripts. Transcripts don't provide accurate information on lyrics from songs.
  
  ---