Skip to content

Commit d5807af

Browse files
author
Sherry Yang
committed
Update for acrolinx.
1 parent 3d7ebc4 commit d5807af

3 files changed

Lines changed: 22 additions & 20 deletions

File tree

learn-pr/wwl-data-ai/get-started-vision-azure/includes/2-vision-enabled-models.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
::: zone pivot="text"
88

9-
Increasingly, new AI models are multimodal. In other words, they support multiple kinds of input data, including images as well as text. **Multimodal models** are AI models that can understand and work with more than one type of data at the same time, such as text, images, audio, or video. For instance, the multimodal model could describe an image in natural language or answer a question about a photo.
9+
Increasingly, new AI models are multimodal. In other words, they support multiple kinds of input data, including images and text. **Multimodal models** are AI models that can understand and work with more than one type of data at the same time, such as text, images, audio, or video. For instance, the multimodal model could describe an image in natural language or answer a question about a photo.
1010

1111
Multimodal models are commonly used as part of:
1212

@@ -36,9 +36,9 @@ For example, vision‑enabled GPT models in Foundry can:
3636

3737
Foundry's model catalog contains many multimodal models including:
3838

39-
- **GPT‑4.1 / GPT‑4.1‑mini / GPT‑4.1‑nano**: These general‑purpose multimodal GPT models can process text and images together. They are commonly used for image description and visual question answering, document and screenshot analysis, and chart and diagram interpretation.
39+
- **GPT‑4.1 / GPT‑4.1‑mini / GPT‑4.1‑nano**: These general‑purpose multimodal GPT models can process text and images together. They're commonly used for image description and visual question answering, document and screenshot analysis, and chart and diagram interpretation.
4040

41-
- **GPT‑5 series (for example, GPT‑5.1, GPT‑5.2)**: The GPT‑5 family available in Foundry includes advanced multimodal models designed for enterprise and agentic scenarios. These models support multimodal inputs (including text and images), structured outputs and tool use, large‑context reasoning across modalities. The GPT-5 series models are typically used in production‑grade AI agents and complex multimodal applications.
41+
- **GPT‑5 series (for example, GPT‑5.1, GPT‑5.2)**: The GPT‑5 family available in Foundry includes advanced multimodal models designed for enterprise and agentic scenarios. These models support multimodal inputs (including text and images), structured outputs, and tool use, large‑context reasoning across modalities. The GPT-5 series models are typically used in production‑grade AI agents and complex multimodal applications.
4242

4343
Foundry also hosts partner‑provided multimodal models in its model catalog, including models from providers such as Anthropic and others that support text and image understanding.
4444

@@ -59,7 +59,7 @@ Once validated, the same capabilities can be accessed programmatically using API
5959

6060
## Using the Azure OpenAI API for image analysis
6161

62-
When moving from the playground to code, images are submitted as part of a *multimodal request* using the **OpenAI Responses API** in Foundry. The OpenAI Responses API is designed for agentic apps and supports native multimodal inputs (including images).
62+
In order to develop an application, you need to move from the Foundry playground to code. In a code editor, you can write your application code using the **OpenAI Responses API** in Foundry. The OpenAI Responses API is designed for agentic apps and supports native multimodal inputs (including images).
6363

6464
At a high level:
6565

@@ -69,27 +69,27 @@ At a high level:
6969

7070
Conceptually, the prompt structure looks like:
7171

72-
- A text instruction (for example, *What objects are visible in this image?*)
72+
- A text instruction (for example, *What objects are visible in this image?*)
7373
- One or more image inputs attached to the same request
7474

7575
This approach allows developers to build applications where users upload images and ask questions about them in real time.
7676

7777
## Using the Azure OpenAI Python SDK
7878

79-
You can use a Microsoft Foundry (Azure OpenAI) resource with the OpenAI API to perform image analysis—including sending images in prompts and getting text responses—by using the Responses API with a vision‑capable model deployment.
79+
You can use a Microsoft Foundry resource with the OpenAI API to perform image analysis—including sending images in prompts and getting text responses—by using the Responses API with a vision‑capable model deployment.
8080

8181
The Python SDK can be installed in the Visual Studio Code *terminal* using:
8282

8383
```bash
8484
pip install openai
8585
```
8686

87-
In the code editor, we can create one Python file which contains application code. Importantly, you need your **Foundry resource** *key* and *endpoint*, and the *name of your deployed model*.
87+
In the code editor, we can create one Python file, which contains application code. Importantly, you need your **Foundry resource** *key* and *endpoint*, and the *name of your deployed model*.
8888

8989
>[!NOTE]
9090
>When you deploy a model in Foundry, it has a *base* or *original* name, and an original **deployment name** you give it. Foundry hosts the deployed model (for example, GPT‑class models with vision) and provides you with an endpoint.
9191
92-
In the code example below, you create the *client*, point it to your endpoint, and pass your *model deployment name* (the name you gave the model) as the `MODEL_NAME`.
92+
In the code example, you create the *client*, point it to your endpoint, and pass your *model deployment name* (the name you gave the model) as the `MODEL_NAME`.
9393

9494
```python
9595
import os
@@ -134,7 +134,7 @@ Then you can write application code that uses the OpenAI API to connect to your
134134

135135
:::image type="content" source="../media/vision-analysis-python.png" alt-text="Screenshot of Visual Studio Code with a python file containing application code for image analysis." lightbox="../media/vision-analysis-python.png":::
136136

137-
The application code needs to load the image data and get a natural language prompt from a user. To submit the input to the model, you need to create a multi-part message that includes both the image and text data. The model will then respond with appropriate output based on both the text and image in the prompt.
137+
The application code needs to load the image data and get a natural language prompt from a user. To submit the input to the model, you need to create a multi-part message that includes both the image and text data. The model can respond with an appropriate output based on both the text and image in the prompt.
138138

139139
:::image type="content" source="../media/image-analysis-result-vs-code.png" alt-text="Screenshot of Visual Studio Code with the result of the image analysis." lightbox="../media/image-analysis-result-vs-code.png":::
140140

learn-pr/wwl-data-ai/get-started-vision-azure/includes/3-image-generation.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,11 @@ For most new projects, Microsoft recommends starting with the **GPT‑Image‑1
1616

1717
Common examples of image generation models in Foundry include:
1818

19-
- **GPT‑Image‑1.5**: GPT‑Image‑1.5 is the **latest and most advanced image generation model** available in Microsoft Foundry. It is designed for high‑fidelity, enterprise‑grade image creation and editing, with strong prompt alignment and improved consistency across iterations. The model supports **text‑to‑image**, **image‑to‑image**, and **precise image editing (inpainting)**, making it well suited for branding, marketing, and design workflows where visual accuracy matters.
19+
- **GPT‑Image‑1.5**: GPT‑Image‑1.5 is the latest and most advanced image generation model available in Microsoft Foundry. It's designed for high‑fidelity, enterprise‑grade image creation and editing, with strong prompt alignment and improved consistency across iterations. The model supports *text‑to‑image*, *image‑to‑image*, and precise image editing, making it well suited for branding, marketing, and design workflows where visual accuracy matters.
2020

21-
- **GPT‑Image‑1**: GPT‑Image‑1 is a powerful, general‑purpose image generation model that builds on the capabilities of earlier DALL·E models. It supports **text‑to‑image generation**, **image variations**, and **inpainting**, and is commonly used for creative applications, prototyping, and visual content generation. GPT‑Image‑1 is widely supported across Foundry tools and APIs, including the Responses API and agent tools.
21+
- **GPT‑Image‑1**: GPT‑Image‑1 is a powerful, general‑purpose image generation model that builds on the capabilities of earlier DALL-E models. It supports *text‑to‑image generation*, image variations, and precise image editing. It's commonly used for creative applications, prototyping, and visual content generation. GPT‑Image‑1 is widely supported across Foundry tools and APIs, including the Responses API and agent tools.
2222

23-
- **GPT‑Image‑1‑Mini**: GPT‑Image‑1‑Mini is a **lighter‑weight and more cost‑efficient** version of GPT‑Image‑1. It supports the same core image generation tasks but is optimized for scenarios where **lower latency or reduced cost** is more important than maximum visual fidelity. This model is a good choice for experimentation, internal tools, or high‑volume image generation.
23+
- **GPT‑Image‑1‑Mini**: GPT‑Image‑1‑Mini is a lighter‑weight and more cost‑efficient version of GPT‑Image‑1. It supports the same core image generation tasks but is optimized for scenarios where lower latency or reduced cost is more important than maximum visual fidelity. This model is a good choice for experimentation, internal tools, or high‑volume image generation.
2424

2525
All of these image generation models can be:
2626

@@ -29,21 +29,23 @@ All of these image generation models can be:
2929
- Accessed programmatically using the **OpenAI Responses API** or image generation APIs
3030

3131
>[!NOTE]
32-
>You can also access third-party image generation models in Foundry. For example, *FLUX* is a family of open‑source image generation models created by Black Forest Labs. They are designed to produce high‑quality, photorealistic, and stylistically flexible images from text prompts.
32+
>You can also access third-party image generation models in Foundry. For example, *FLUX* is a family of open‑source image generation models created by Black Forest Labs. They're designed to produce high‑quality, photorealistic, and stylistically flexible images from text prompts.
3333
3434
#### Image generation in the Foundry playground
3535

36-
In this case, I’ve deployed such a model, and in the Foundry portal playground, I can describe the image that I want to create. And after a few minutes, an image matching my description is generated.
36+
You can deploy a vision-enabled model and test it in the Foundry portal playground. To test the model, you can describe the image that you want to create. And after a few minutes, an image matching your description is generated.
3737

3838
![Screenshot of image generation in the Foundry playground.](../media/image-generation-playground.png)
3939

4040
:::image type="content" source="../media/image-generation-playground-code.png" alt-text="Screenshot of code example in the Foundry playground." lightbox="../media/image-generation-playground-code.png":::
4141

4242
## Using the OpenAI Python SDK for image generation
4343

44-
To build an application that uses an image generation model like this, you can write code that uses the OpenAI APIs images class to submit a prompt and retrieve the generated image. The ability to dynamically generate original images from descriptions can be immensely valuable in scenarios that include media, publishing, and content creation.
44+
You can write code to build an application that uses an image generation model using Azure OpenAI API's images class. The OpenAI images class in the **OpenAI Python SDK** lets you generate new images and edit existing images. You can use the OpenAI Python SDK by calling the OpenAI Images API endpoint through a Python interface.
4545

46-
To generate images with the OpenAI Python SDK you need:
46+
The ability to dynamically generate original images from descriptions can be immensely valuable in scenarios that include media, publishing, and content creation.
47+
48+
To generate images with the OpenAI Python SDK, you need:
4749

4850
- **A Foundry resource**
4951
- A **vision‑capable model deployed** (the deployment name is what you pass as `MODEL_NAME`)

learn-pr/wwl-data-ai/get-started-vision-azure/includes/4-video-generation.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Typical uses:
2222
- Short animations from text descriptions
2323
- Visual prototyping for creative workflows
2424

25-
**Sora 2 public preview**: **Sora 2** is the **next‑generation video generation model** in Foundry and represents a significant upgrade over Sora 1. It supports multiple modalities, including: **Text → video**, **Image → video**, **Video → video (remix)**. Sora 2 also introduces **audio generation**, improved realism, and remixing capabilities that allow targeted edits instead of regenerating an entire video. It is available via the Azure OpenAI **v1 API** and the Foundry Video Playground, with built‑in Responsible AI safeguards.
25+
**Sora 2 (public preview)**: **Sora 2** is the **next‑generation video generation model** in Foundry and represents a significant upgrade over Sora 1. It supports multiple modalities, including: **Text → video**, **Image → video**, **Video → video (remix)**. Sora 2 also introduces **audio generation**, improved realism, and remixing capabilities that allow targeted edits instead of regenerating an entire video. It's available via the Azure OpenAI **v1 API** and the Foundry Video Playground, with built‑in Responsible AI safeguards.
2626

2727
Typical uses:
2828
- Marketing and promotional videos
@@ -34,7 +34,7 @@ Typical uses:
3434
3535
#### Video generation in the Foundry playground
3636

37-
Once you've deployed an appropriate video generation model, you can test it in the Foundry portal playground. In the playground, you can also specify parameters like video dimensions and duration.
37+
Once you deploy an appropriate video generation model, you can test it in the Foundry portal playground. In the playground, you can also specify parameters like video dimensions and duration.
3838

3939
Your prompts to the video generation model should include a description of the content in the desired video. After a few minutes, the model produces a video.
4040

@@ -64,7 +64,7 @@ The Sora 2 API provides distinct endpoints for:
6464

6565
#### 1. Create a video job
6666

67-
In the example below, the script starts an **async render job** and returns a response that includes a **video id** to poll.
67+
In the example, the script starts an **async render job** and returns a response that includes a **video id** to poll.
6868

6969
> **Base URL pattern (v1)**:
7070
> `https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/`
@@ -83,7 +83,7 @@ curl -X POST "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/videos" \
8383

8484
#### 2. Poll job status until completed
8585

86-
In the example below, the script polls the endpoint until the job reaches `completed` (or `failed`).
86+
In the example, the script polls the endpoint until the job reaches `completed` (or `failed`).
8787

8888
```bash
8989
curl -X GET "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/videos/{video_id}" \

0 commit comments

Comments
 (0)