You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/get-started-vision-azure/includes/2-vision-enabled-models.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@
6
6
7
7
::: zone pivot="text"
8
8
9
-
Increasingly, new AI models are multimodal. In other words, they support multiple kinds of input data, including images as well as text. **Multimodal models** are AI models that can understand and work with more than one type of data at the same time, such as text, images, audio, or video. For instance, the multimodal model could describe an image in natural language or answer a question about a photo.
9
+
Increasingly, new AI models are multimodal. In other words, they support multiple kinds of input data, including images and text. **Multimodal models** are AI models that can understand and work with more than one type of data at the same time, such as text, images, audio, or video. For instance, the multimodal model could describe an image in natural language or answer a question about a photo.
10
10
11
11
Multimodal models are commonly used as part of:
12
12
@@ -36,9 +36,9 @@ For example, vision‑enabled GPT models in Foundry can:
36
36
37
37
Foundry's model catalog contains many multimodal models including:
38
38
39
-
-**GPT‑4.1 / GPT‑4.1‑mini / GPT‑4.1‑nano**: These general‑purpose multimodal GPT models can process text and images together. They are commonly used for image description and visual question answering, document and screenshot analysis, and chart and diagram interpretation.
39
+
-**GPT‑4.1 / GPT‑4.1‑mini / GPT‑4.1‑nano**: These general‑purpose multimodal GPT models can process text and images together. They're commonly used for image description and visual question answering, document and screenshot analysis, and chart and diagram interpretation.
40
40
41
-
-**GPT‑5 series (for example, GPT‑5.1, GPT‑5.2)**: The GPT‑5 family available in Foundry includes advanced multimodal models designed for enterprise and agentic scenarios. These models support multimodal inputs (including text and images), structured outputs and tool use, large‑context reasoning across modalities. The GPT-5 series models are typically used in production‑grade AI agents and complex multimodal applications.
41
+
-**GPT‑5 series (for example, GPT‑5.1, GPT‑5.2)**: The GPT‑5 family available in Foundry includes advanced multimodal models designed for enterprise and agentic scenarios. These models support multimodal inputs (including text and images), structured outputs, and tool use, large‑context reasoning across modalities. The GPT-5 series models are typically used in production‑grade AI agents and complex multimodal applications.
42
42
43
43
Foundry also hosts partner‑provided multimodal models in its model catalog, including models from providers such as Anthropic and others that support text and image understanding.
44
44
@@ -59,7 +59,7 @@ Once validated, the same capabilities can be accessed programmatically using API
59
59
60
60
## Using the Azure OpenAI API for image analysis
61
61
62
-
When moving from the playground to code, images are submitted as part of a *multimodal request*using the **OpenAI Responses API** in Foundry. The OpenAI Responses API is designed for agentic apps and supports native multimodal inputs (including images).
62
+
In order to develop an application, you need to move from the Foundry playground to code. In a code editor, you can write your application code using the **OpenAI Responses API** in Foundry. The OpenAI Responses API is designed for agentic apps and supports native multimodal inputs (including images).
63
63
64
64
At a high level:
65
65
@@ -69,27 +69,27 @@ At a high level:
69
69
70
70
Conceptually, the prompt structure looks like:
71
71
72
-
- A text instruction (for example, *“What objects are visible in this image?”*)
72
+
- A text instruction (for example, *What objects are visible in this image?*)
73
73
- One or more image inputs attached to the same request
74
74
75
75
This approach allows developers to build applications where users upload images and ask questions about them in real time.
76
76
77
77
## Using the Azure OpenAI Python SDK
78
78
79
-
You can use a Microsoft Foundry (Azure OpenAI) resource with the OpenAI API to perform image analysis—including sending images in prompts and getting text responses—by using the Responses API with a vision‑capable model deployment.
79
+
You can use a Microsoft Foundry resource with the OpenAI API to perform image analysis—including sending images in prompts and getting text responses—by using the Responses API with a vision‑capable model deployment.
80
80
81
81
The Python SDK can be installed in the Visual Studio Code *terminal* using:
82
82
83
83
```bash
84
84
pip install openai
85
85
```
86
86
87
-
In the code editor, we can create one Python file which contains application code. Importantly, you need your **Foundry resource***key* and *endpoint*, and the *name of your deployed model*.
87
+
In the code editor, we can create one Python file, which contains application code. Importantly, you need your **Foundry resource***key* and *endpoint*, and the *name of your deployed model*.
88
88
89
89
>[!NOTE]
90
90
>When you deploy a model in Foundry, it has a *base* or *original* name, and an original **deployment name** you give it. Foundry hosts the deployed model (for example, GPT‑class models with vision) and provides you with an endpoint.
91
91
92
-
In the code example below, you create the *client*, point it to your endpoint, and pass your *model deployment name* (the name you gave the model) as the `MODEL_NAME`.
92
+
In the code example, you create the *client*, point it to your endpoint, and pass your *model deployment name* (the name you gave the model) as the `MODEL_NAME`.
93
93
94
94
```python
95
95
import os
@@ -134,7 +134,7 @@ Then you can write application code that uses the OpenAI API to connect to your
134
134
135
135
:::image type="content" source="../media/vision-analysis-python.png" alt-text="Screenshot of Visual Studio Code with a python file containing application code for image analysis." lightbox="../media/vision-analysis-python.png":::
136
136
137
-
The application code needs to load the image data and get a natural language prompt from a user. To submit the input to the model, you need to create a multi-part message that includes both the image and text data. The model will then respond with appropriate output based on both the text and image in the prompt.
137
+
The application code needs to load the image data and get a natural language prompt from a user. To submit the input to the model, you need to create a multi-part message that includes both the image and text data. The model can respond with an appropriate output based on both the text and image in the prompt.
138
138
139
139
:::image type="content" source="../media/image-analysis-result-vs-code.png" alt-text="Screenshot of Visual Studio Code with the result of the image analysis." lightbox="../media/image-analysis-result-vs-code.png":::
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/get-started-vision-azure/includes/3-image-generation.md
+9-7Lines changed: 9 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,11 +16,11 @@ For most new projects, Microsoft recommends starting with the **GPT‑Image‑1
16
16
17
17
Common examples of image generation models in Foundry include:
18
18
19
-
-**GPT‑Image‑1.5**: GPT‑Image‑1.5 is the **latest and most advanced image generation model** available in Microsoft Foundry. It is designed for high‑fidelity, enterprise‑grade image creation and editing, with strong prompt alignment and improved consistency across iterations. The model supports **text‑to‑image**, **image‑to‑image**, and **precise image editing (inpainting)**, making it well suited for branding, marketing, and design workflows where visual accuracy matters.
19
+
-**GPT‑Image‑1.5**: GPT‑Image‑1.5 is the latest and most advanced image generation model available in Microsoft Foundry. It's designed for high‑fidelity, enterprise‑grade image creation and editing, with strong prompt alignment and improved consistency across iterations. The model supports *text‑to‑image*, *image‑to‑image*, and precise image editing, making it well suited for branding, marketing, and design workflows where visual accuracy matters.
20
20
21
-
-**GPT‑Image‑1**: GPT‑Image‑1 is a powerful, general‑purpose image generation model that builds on the capabilities of earlier DALL·E models. It supports **text‑to‑image generation**, **image variations**, and **inpainting**, and is commonly used for creative applications, prototyping, and visual content generation. GPT‑Image‑1 is widely supported across Foundry tools and APIs, including the Responses API and agent tools.
21
+
-**GPT‑Image‑1**: GPT‑Image‑1 is a powerful, general‑purpose image generation model that builds on the capabilities of earlier DALL-E models. It supports *text‑to‑image generation*, image variations, and precise image editing. It's commonly used for creative applications, prototyping, and visual content generation. GPT‑Image‑1 is widely supported across Foundry tools and APIs, including the Responses API and agent tools.
22
22
23
-
-**GPT‑Image‑1‑Mini**: GPT‑Image‑1‑Mini is a **lighter‑weight and more cost‑efficient** version of GPT‑Image‑1. It supports the same core image generation tasks but is optimized for scenarios where **lower latency or reduced cost** is more important than maximum visual fidelity. This model is a good choice for experimentation, internal tools, or high‑volume image generation.
23
+
-**GPT‑Image‑1‑Mini**: GPT‑Image‑1‑Mini is a lighter‑weight and more cost‑efficient version of GPT‑Image‑1. It supports the same core image generation tasks but is optimized for scenarios where lower latency or reduced cost is more important than maximum visual fidelity. This model is a good choice for experimentation, internal tools, or high‑volume image generation.
24
24
25
25
All of these image generation models can be:
26
26
@@ -29,21 +29,23 @@ All of these image generation models can be:
29
29
- Accessed programmatically using the **OpenAI Responses API** or image generation APIs
30
30
31
31
>[!NOTE]
32
-
>You can also access third-party image generation models in Foundry. For example, *FLUX* is a family of open‑source image generation models created by Black Forest Labs. They are designed to produce high‑quality, photorealistic, and stylistically flexible images from text prompts.
32
+
>You can also access third-party image generation models in Foundry. For example, *FLUX* is a family of open‑source image generation models created by Black Forest Labs. They're designed to produce high‑quality, photorealistic, and stylistically flexible images from text prompts.
33
33
34
34
#### Image generation in the Foundry playground
35
35
36
-
In this case, I’ve deployed such a model, and in the Foundry portal playground, I can describe the image that I want to create. And after a few minutes, an image matching my description is generated.
36
+
You can deploy a vision-enabled model and test it in the Foundry portal playground. To test the model, you can describe the image that you want to create. And after a few minutes, an image matching your description is generated.
37
37
38
38

39
39
40
40
:::image type="content" source="../media/image-generation-playground-code.png" alt-text="Screenshot of code example in the Foundry playground." lightbox="../media/image-generation-playground-code.png":::
41
41
42
42
## Using the OpenAI Python SDK for image generation
43
43
44
-
To build an application that uses an image generation model like this, you can write code that uses the OpenAI API’s images class to submit a prompt and retrieve the generated image. The ability to dynamically generate original images from descriptions can be immensely valuable in scenarios that include media, publishing, and content creation.
44
+
You can write code to build an application that uses an image generation model using Azure OpenAI API's images class. The OpenAI images class in the **OpenAI Python SDK** lets you generate new images and edit existing images. You can use the OpenAI Python SDK by calling the OpenAI Images API endpoint through a Python interface.
45
45
46
-
To generate images with the OpenAI Python SDK you need:
46
+
The ability to dynamically generate original images from descriptions can be immensely valuable in scenarios that include media, publishing, and content creation.
47
+
48
+
To generate images with the OpenAI Python SDK, you need:
47
49
48
50
-**A Foundry resource**
49
51
- A **vision‑capable model deployed** (the deployment name is what you pass as `MODEL_NAME`)
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/get-started-vision-azure/includes/4-video-generation.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ Typical uses:
22
22
- Short animations from text descriptions
23
23
- Visual prototyping for creative workflows
24
24
25
-
**Sora 2 — public preview**: **Sora 2** is the **next‑generation video generation model** in Foundry and represents a significant upgrade over Sora 1. It supports multiple modalities, including: **Text → video**, **Image → video**, **Video → video (remix)**. Sora 2 also introduces **audio generation**, improved realism, and remixing capabilities that allow targeted edits instead of regenerating an entire video. It is available via the Azure OpenAI **v1 API** and the Foundry Video Playground, with built‑in Responsible AI safeguards.
25
+
**Sora 2 (public preview)**: **Sora 2** is the **next‑generation video generation model** in Foundry and represents a significant upgrade over Sora 1. It supports multiple modalities, including: **Text → video**, **Image → video**, **Video → video (remix)**. Sora 2 also introduces **audio generation**, improved realism, and remixing capabilities that allow targeted edits instead of regenerating an entire video. It's available via the Azure OpenAI **v1 API** and the Foundry Video Playground, with built‑in Responsible AI safeguards.
26
26
27
27
Typical uses:
28
28
- Marketing and promotional videos
@@ -34,7 +34,7 @@ Typical uses:
34
34
35
35
#### Video generation in the Foundry playground
36
36
37
-
Once you've deployed an appropriate video generation model, you can test it in the Foundry portal playground. In the playground, you can also specify parameters like video dimensions and duration.
37
+
Once you deploy an appropriate video generation model, you can test it in the Foundry portal playground. In the playground, you can also specify parameters like video dimensions and duration.
38
38
39
39
Your prompts to the video generation model should include a description of the content in the desired video. After a few minutes, the model produces a video.
40
40
@@ -64,7 +64,7 @@ The Sora 2 API provides distinct endpoints for:
64
64
65
65
#### 1. Create a video job
66
66
67
-
In the example below, the script starts an **async render job** and returns a response that includes a **video id** to poll.
67
+
In the example, the script starts an **async render job** and returns a response that includes a **video id** to poll.
0 commit comments