MicrosoftDocs
diff --git a/‎learn-pr/wwl-data-ai/get-started-vision-azure/1-introduction.yml‎
Lines changed: 15 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-vision-azure/1-introduction.yml‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-vision-azure/2-vision-enabled-models.yml‎
Lines changed: 15 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-vision-azure/2-vision-enabled-models.yml‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-vision-azure/3-image-generation.yml‎
Lines changed: 15 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-vision-azure/3-image-generation.yml‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-vision-azure/4-video-generation.yml‎
Lines changed: 14 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-vision-azure/4-video-generation.yml‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-vision-azure/5-exercise.yml‎
Lines changed: 15 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-vision-azure/5-exercise.yml‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-vision-azure/6-knowledge-check.yml‎
Lines changed: 58 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-vision-azure/6-knowledge-check.yml‎
Lines changed: 58 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-vision-azure/7-summary.yml‎
Lines changed: 15 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-vision-azure/7-summary.yml‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-vision-azure/includes/1-introduction.md‎
Lines changed: 28 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-vision-azure/includes/1-introduction.md‎
Lines changed: 28 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-vision-azure/includes/2-vision-enabled-models.md‎
Lines changed: 143 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-vision-azure/includes/2-vision-enabled-models.md‎
Lines changed: 143 additions & 0 deletions
@@ -0,0 +1,15 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.get-started-with-computer-vision-in-azure.introduction
+title: Introduction
+metadata:
+  title: Introduction
+  description: Introduction
+  author: wwlpublish
+  ms.author: sheryang
+  ms.date: 02/17/2026
+  ms.topic: unit
+  zone_pivot_groups: video-or-text
+durationInMinutes: 1
+content: |
+  [!include[](includes/1-introduction.md)]
+
@@ -0,0 +1,15 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.get-started-with-computer-vision-in-azure.vision-enabled-models
+title: Multimodal models for image analysis
+metadata:
+  title: Multimodal models for image analysis
+  description: Multimodal models for image analysis
+  author: wwlpublish
+  ms.author: sheryang
+  ms.date: 02/17/2026
+  ms.topic: unit
+  zone_pivot_groups: video-or-text
+durationInMinutes: 7
+content: |
+  [!include[](includes/2-vision-enabled-models.md)]
+
@@ -0,0 +1,15 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.get-started-with-computer-vision-in-azure.image-generation
+title: Image generation models
+metadata:
+  title: Image generation models
+  description: Understand image generation capabilities
+  author: wwlpublish
+  ms.author: sheryang
+  ms.date: 02/17/2026
+  ms.topic: unit
+  zone_pivot_groups: video-or-text
+durationInMinutes: 4
+content: |
+  [!include[](includes/3-image-generation.md)]
+
@@ -0,0 +1,14 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.get-started-with-computer-vision-in-azure.video-generation
+title: Video generation models
+metadata:
+  title: Video generation models
+  description: Understand video generation capabilities
+  author: wwlpublish
+  ms.author: sheryang
+  ms.date: 02/17/2026
+  ms.topic: unit
+  zone_pivot_groups: video-or-text
+durationInMinutes: 3
+content: |
+  [!include[](includes/4-video-generation.md)]
@@ -0,0 +1,15 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.get-started-with-computer-vision-in-azure.exercise
+title: Exercise - Get started with computer vision in Microsoft Foundry
+metadata:
+  title: Exercise - Get started with computer vision in Microsoft Foundry
+  description: Exercise - Get started with computer vision in Microsoft Foundry
+  author: graememalcolm
+  ms.author: gmalc
+  ms.date: 02/17/2026
+  ms.topic: unit
+durationInMinutes: 30
+content: |
+  [!include[](includes/5-exercise.md)]
+
+
@@ -0,0 +1,58 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.get-started-with-computer-vision-in-azure.knowledge-check
+title: Module assessment
+metadata:
+  title: Module assessment
+  description: Knowledge check
+  author: wwlpublish
+  ms.author: sheryang
+  ms.date: 02/17/2026
+  ms.topic: unit
+durationInMinutes: 4
+quiz:
+  title: ""
+  questions:
+  - content: "What is a multimodal model?"
+    choices:
+    - content: "A model that can only process images but not text."
+      isCorrect: false
+      explanation: "Multimodal models are designed to handle multiple types of input at the same time, such as text and images. "
+    - content: "A model that can understand and work with more than one type of data, such as text and images."
+      isCorrect: true
+      explanation: "Multimodal models are designed to handle multiple types of input at the same time, such as text and images. "
+    - content: "A model that generates video content only."
+      isCorrect: false
+      explanation: "Multimodal models are designed to handle multiple types of input at the same time, such as text and images. "
+  - content: "How can developers programmatically generate images using Foundry image generation models?"
+    choices:
+    - content: "By sending text prompts through the OpenAI Responses API using a deployed image model"
+      isCorrect: true
+      explanation: " Developers can submit text prompts and retrieve generated images from Foundry models using the OpenAI Responses API."
+    - content: "By uploading images through the Foundry Playground UI."
+      isCorrect: false
+      explanation: "The Foundry Playground UI is for interacting with models, not for programmatically generating images."
+    - content: "By calling the GPT-4.1 model endpoint."
+      isCorrect: false
+      explanation: "GPT-4.1 is a multimodal model that can process text and images together. It can't generate images."
+  - content: "When you generate images programmatically using the OpenAI Python SDK with Microsoft Foundry, which value should you pass as the model parameter in the request?"
+    choices:
+    - content: "The original base model name (for example, gpt-image-1.5)."
+      isCorrect: false
+      explanation: "In Microsoft Foundry, API calls reference the deployment name, not the underlying base model name."
+    - content: "The deployment name you gave the image generation model in your Foundry resource."
+      isCorrect: true
+      explanation: "In Microsoft Foundry, API calls reference the deployment name, which is the name you gave the model."
+    - content: "The name you gave your Foundry resource."
+      isCorrect: false
+      explanation: "You need to provide the deployment name of the model, not the Foundry resource name, when making API calls to generate images."
+  - content: "Why is video generation with Sora models in Microsoft Foundry handled as an asynchronous job?"
+    choices:
+    - content: "Because video generation requires user interaction during rendering."
+      isCorrect: false
+      explanation: "Video generation doesn't require user interaction during rendering."
+    - content: "Because the REST API doesn't support synchronous requests."
+      isCorrect: false
+      explanation: "REST APIs absolutely can be synchronous. In this content, the reason given for async is workload duration and compute cost, not that it's not possible to process synchronously."
+    - content: "Because video generation is resource‑intensive and takes time to complete."
+      isCorrect: true
+      explanation: "Video generation is computationally intensive and can take several minutes to complete. For this reason, Foundry runs video generation as an asynchronous process where you create a job, poll for its status, and download the video once it's finished."
@@ -0,0 +1,15 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.get-started-with-computer-vision-in-azure.summary
+title: Summary
+metadata:
+  title: Summary
+  description: Summary
+  author: wwlpublish
+  ms.author: sheryang
+  ms.date: 02/17/2026
+  ms.topic: unit
+  zone_pivot_groups: video-or-text
+durationInMinutes: 1
+content: |
+  [!include[](includes/7-summary.md)]
+
@@ -0,0 +1,28 @@
+::: zone pivot="video"
+
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=c3f07882-8a43-4671-9f1a-da926192e784]
+
+::: zone-end
+
+::: zone pivot="text"
+
+**Computer vision** is a field of AI that enables machines to interpret and understand visual information from the world—such as images, videos, and live camera feeds. Computer vision capabilities are powered by AI models and support the automation of all kinds of time-intensive tasks. 
+
+This module will discuss AI models that can identify and analyze objects, recognize patterns, read text within images, and interpret scenes much like a human would. The module also covers visual AI models that can go beyond image analysis to generate new visual content. Together, these capabilities enable a wide range of applications from image search and document analysis, to creative tools and interactive AI experiences, by allowing systems to both see and create visual information.
+
+Consider these applications of computer vision:  
+
+- **Defect detection in manufacturing**: AI vision systems inspect products on assembly lines in real time. They detect surface defects, misalignments, or missing components using object detection and image segmentation, reducing waste and improving quality control.
+
+- **Medical imaging analysis**: Computer vision helps radiologists analyze X-rays, MRIs, and CT scans. AI models can highlight anomalies like tumors or fractures, assist in early diagnosis, and reduce human error.
+
+- **Shelf monitoring in retail**: Retailers use AI vision to monitor store shelves. Cameras detect when products are out of stock or misplaced, enabling real-time inventory updates and improving customer experience.
+
+- **Autonomous vehicles**: Self-driving cars rely on computer vision to recognize road signs, lane markings, pedestrians, and other vehicles. This enables safe navigation and decision-making in dynamic environments.
+
+Next, explore multimodal models in **Microsoft Foundry**, Microsoft's unified platform-as-a-service offering on Azure for enterprise AI operations and application development. 
+
+::: zone-end
+
+> [!NOTE]
+> We recognize that different people like to learn in different ways. You can choose to complete this module in video-based format or you can read the content as text and images. The text contains greater detail than the videos, so in some cases you might want to refer to it as supplemental material to the video presentation.
@@ -0,0 +1,143 @@
+::: zone pivot="video"
+
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=5bfdd223-9358-439d-8814-56a006aafa76]
+
+::: zone-end
+
+::: zone pivot="text"
+
+Increasingly, new AI models are multimodal. In other words, they support multiple kinds of input data, including images and text. **Multimodal models** are AI models that can understand and work with more than one type of data at the same time, such as text, images, audio, or video. For instance, the multimodal model could describe an image in natural language or answer a question about a photo.
+
+Multimodal models are commonly used as part of:
+
+- **AI applications**, where image understanding enhances user workflows
+- **AI agents**, where visual input helps the agent make better decisions
+
+Examples include:
+
+- An agent that reviews uploaded documents and screenshots
+- A support app that analyzes photos submitted by customers
+- A learning tool that explains diagrams or charts in plain language
+
+Because multimodal models accept both text and images, they reduce the need for separate vision pipelines and make it easier to build end‑to‑end intelligent experiences.
+
+The ability for models to combine visual understanding with natural language responses is referred to as **vision‑enabled GPT models** or GPT with vision. Vision‑enabled models are designed for flexible, general‑purpose visual reasoning. They can analyze visual input and respond in natural language, making it easy to build intelligent applications without needing deep computer vision expertise.
+
+## Multimodal models in Microsoft Foundry
+
+Microsoft Foundry includes many models that accept image-based input, enabling you to create intelligent, vision-based solutions. Multimodal models in Microsoft Foundry allow applications and agents to understand, analyze, and reason over images and visual content. 
+
+For example, vision‑enabled GPT models in Foundry can:
+
+- Describe the contents of an image in natural language
+- Answer questions about objects, text, or scenes in an image
+- Extract meaning from charts, screenshots, documents, or photos
+- Combine image understanding with text instructions in a single prompt
+ 
+Foundry's model catalog contains many multimodal models including:
+
+- **GPT‑4.1 / GPT‑4.1‑mini / GPT‑4.1‑nano**: These general‑purpose multimodal GPT models can process text and images together. They're commonly used for image description and visual question answering, document and screenshot analysis, and chart and diagram interpretation.
+
+- **GPT‑5 series (for example, GPT‑5.1, GPT‑5.2)**: The GPT‑5 family available in Foundry includes advanced multimodal models designed for enterprise and agentic scenarios. These models support multimodal inputs (including text and images), structured outputs, and tool use, large‑context reasoning across modalities. The GPT-5 series models are typically used in production‑grade AI agents and complex multimodal applications.
+
+Foundry also hosts partner‑provided multimodal models in its model catalog, including models from providers such as Anthropic and others that support text and image understanding. 
+
+#### Image analysis in the Foundry playground
+
+> [!NOTE]
+> Foundry portal has a *classic* user interface (UI) and a *new* user interface.
+
+In the *new Microsoft Foundry portal*, you can use the model playground to chat with a deployed model. You can select a vision‑enabled model, upload images, and test prompts interactively to understand how the model interprets visual information.
+
+:::image type="content" source="../media/playground-upload-image.png" alt-text="Screenshot of Foundry Playground with a gpt-4.1 mini model deployed and the user uploading an image of an animal." lightbox="../media/playground-upload-image.png":::
+
+For example, you can attach an image file and get the multimodal model (such as gpt-4.1 mini) to analyze and describe it. 
+
+:::image type="content" source="../media/image-analysis-result-playground.png" alt-text="Screenshot of Foundry Playground with a prompt asking the model to describe what is in an image and a response with a description." lightbox="../media/image-analysis-result-playground.png":::
+
+Once validated, the same capabilities can be accessed programmatically using APIs, allowing images to be submitted alongside text prompts in application code.
+
+## Using the Azure OpenAI API for image analysis
+
+In order to develop an application, you need to move from the Foundry playground to code. In a code editor, you can write your application code using the **OpenAI Responses API** in Foundry. The OpenAI Responses API is designed for agentic apps and supports native multimodal inputs (including images).
+
+At a high level:
+
+- A single request can include text input and image input together
+- Images can be provided as URLs or as base64‑encoded image data
+- The model processes both inputs simultaneously to generate a response
+
+Conceptually, the prompt structure looks like:
+
+- A text instruction (for example, *What objects are visible in this image?*)
+- One or more image inputs attached to the same request
+
+This approach allows developers to build applications where users upload images and ask questions about them in real time.  
+
+## Using the Azure OpenAI Python SDK 
+
+You can use a Microsoft Foundry resource with the OpenAI API to perform image analysis—including sending images in prompts and getting text responses—by using the Responses API with a vision‑capable model deployment. 
+
+The Python SDK can be installed in the Visual Studio Code *terminal* using: 
+
+```bash
+pip install openai
+```
+
+In the code editor, we can create one Python file, which contains application code. Importantly, you need your **Foundry resource** *key* and *endpoint*, and the *name of your deployed model*. 
+
+>[!NOTE]
+>When you deploy a model in Foundry, it has a *base* or *original* name, and an original **deployment name** you give it. Foundry hosts the deployed model (for example, GPT‑class models with vision) and provides you with an endpoint. 
+
+In the code example, you create the *client*, point it to your endpoint, and pass your *model deployment name* (the name you gave the model) as the `MODEL_NAME`.
+
+```python
+import os
+from openai import OpenAI
+
+# Environment variables you set locally or in your app service:
+FOUNDRY_KEY = "... your key ..."
+ENDPOINT = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/"
+MODEL_NAME = "your-model-deployment-name"  # e.g., "gpt-4.1-mini" deployed as "my-vision-deploy"
+
+client = OpenAI(
+    api_key=os.getenv("FOUNDRY_KEY"),
+    base_url=os.getenv("ENDPOINT"),
+)
+
+image_url = ""
+
+response = client.responses.create(
+    model=os.getenv("MODEL_NAME"),  # your deployment name 
+    input=[
+        {
+            "role": "user",
+            "content": [
+                {"type": "input_text", "text": "What is in this image? Provide 3 bullet points."},
+                {"type": "input_image", "image_url": image_url}
+            ],
+        }
+    ],
+)
+
+print(response.output_text)
+
+```
+
+#### Client app example
+
+You can build a custom application that uses a vision-enabled model to analyze an image with the OpenAI Python SDK. For example, suppose you want to build an app that can identify animals photographed on Safari. You can upload your photos and create a Python file in your code editor. 
+
+![Screenshot of the image used for image analysis.](../media/image-example-vs-code.png)
+
+Then you can write application code that uses the OpenAI API to connect to your model's endpoint in Foundry. 
+
+:::image type="content" source="../media/vision-analysis-python.png" alt-text="Screenshot of Visual Studio Code with a python file containing application code for image analysis." lightbox="../media/vision-analysis-python.png":::
+
+The application code needs to load the image data and get a natural language prompt from a user. To submit the input to the model, you need to create a multi-part message that includes both the image and text data. The model can respond with an appropriate output based on both the text and image in the prompt. 
+
+:::image type="content" source="../media/image-analysis-result-vs-code.png" alt-text="Screenshot of Visual Studio Code with the result of the image analysis." lightbox="../media/image-analysis-result-vs-code.png":::
+
+Next, learn how to use Foundry models and the Azure OpenAI SDK for image generation.
+
+::: zone-end