Skip to content

Commit 2992562

Browse files
authored
Merge pull request #53743 from buzahid/computer-vision
Added generate ai video module
2 parents 514b3bf + 92c46b2 commit 2992562

15 files changed

Lines changed: 423 additions & 0 deletions
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.generate-video-with-foundry.introduction
3+
title: Introduction
4+
metadata:
5+
title: Introduction
6+
description: "Get started with generating videos in Microsoft Foundry."
7+
ms.date: 03/02/2026
8+
author: buzahid
9+
ms.author: buzahid
10+
ms.topic: unit
11+
durationInMinutes: 1
12+
content: |
13+
[!include[](includes/1-introduction.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.generate-video-with-foundry.deploy-video-model
3+
title: Deploy a video generating model
4+
metadata:
5+
title: Deploy a video generating model
6+
description: "Deploy a video generating model that can create videos from text prompts."
7+
ms.date: 03/02/2026
8+
author: buzahid
9+
ms.author: buzahid
10+
ms.topic: unit
11+
durationInMinutes: 3
12+
content: |
13+
[!include[](includes/2-deploy-video-model.md)]
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.generate-video-with-foundry.generate-video-from-prompt
3+
title: Generate video from a prompt
4+
metadata:
5+
title: Generate video from a prompt
6+
description: "Generate videos from text prompts with Sora 2 in Microsoft Foundry."
7+
ms.date: 03/02/2026
8+
author: buzahid
9+
ms.author: buzahid
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
content: |
13+
[!include[](includes/3-generate-video-from-prompt.md)]
14+
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.generate-video-with-foundry.generate-video-in-python
3+
title: Generate video in Python
4+
metadata:
5+
title: Generate video in Python
6+
description: "Use the Azure OpenAI SDK to generate videos in Python."
7+
ms.date: 03/02/2026
8+
author: buzahid
9+
ms.author: buzahid
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
content: |
13+
[!include[](includes/4-generate-video-in-python.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.generate-video-with-foundry.exercise
3+
title: Exercise - Generate video with Sora 2 in Microsoft Foundry
4+
metadata:
5+
title: Exercise - Generate video with Sora 2 in Microsoft Foundry
6+
description: "Get practical experience with generating videos using Sora 2 in Microsoft Foundry."
7+
ms.date: 03/02/2026
8+
author: buzahid
9+
ms.author: buzahid
10+
ms.topic: unit
11+
durationInMinutes: 30
12+
content: |
13+
[!include[](includes/5-exercise.md)]
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.generate-video-with-foundry.knowledge-check
3+
title: Module assessment
4+
metadata:
5+
title: Module assessment
6+
description: "Check your learning on video generation with Sora 2."
7+
ms.date: 03/02/2026
8+
author: buzahid
9+
ms.author: buzahid
10+
ms.topic: unit
11+
durationInMinutes: 3
12+
content: |
13+
quiz:
14+
questions:
15+
- content: "What video durations does Sora 2 support?"
16+
choices:
17+
- content: "1 to 20 seconds in 1-second increments"
18+
isCorrect: false
19+
explanation: "Incorrect. Sora 2 supports specific duration values, not a continuous range."
20+
- content: "4, 8, or 12 seconds"
21+
isCorrect: true
22+
explanation: "Correct. Sora 2 supports video durations of 4, 8, or 12 seconds."
23+
- content: "Any duration up to 60 seconds"
24+
isCorrect: false
25+
explanation: "Incorrect. Sora 2 has specific supported duration values."
26+
- content: "What is required when using a reference image with Sora 2?"
27+
choices:
28+
- content: "The image must be smaller than 1 MB"
29+
isCorrect: false
30+
explanation: "Incorrect. The key requirement is that the image resolution must match the target video size."
31+
- content: "The image resolution must match the target video size"
32+
isCorrect: true
33+
explanation: "Correct. Reference images must match the target video resolution exactly (1280x720 or 720x1280)."
34+
- content: "The image must contain at least one human face"
35+
isCorrect: false
36+
explanation: "Incorrect. Sora 2 currently rejects reference images with human faces."
37+
- content: "What is the remix feature used for in Sora 2?"
38+
choices:
39+
- content: "Combining multiple videos into one"
40+
isCorrect: false
41+
explanation: "Incorrect. Remix modifies an existing video while preserving its structure."
42+
- content: "Making targeted adjustments to an existing video without regenerating from scratch"
43+
isCorrect: true
44+
explanation: "Correct. The remix feature lets you modify specific aspects while preserving scene transitions, visual layout, and overall structure."
45+
- content: "Adding background music to generated videos"
46+
isCorrect: false
47+
explanation: "Incorrect. Remix is for making visual adjustments to existing videos."
48+
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.generate-video-with-foundry.summary
3+
title: Summary
4+
metadata:
5+
title: Summary
6+
description: "Reflect on what you've learned about generating videos with Sora 2 in Microsoft Foundry."
7+
ms.date: 03/02/2026
8+
author: buzahid
9+
ms.author: buzahid
10+
ms.topic: unit
11+
durationInMinutes: 1
12+
content: |
13+
[!include[](includes/7-summary.md)]
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Generative AI models enable you to develop chat-based applications that reason over and respond to input. Often this input takes the form of a text-based prompt, but increasingly multimodal models that can respond to visual input are becoming available.
2+
3+
In this module, we'll discuss vision-enabled generative AI and explore how you can use Microsoft Foundry to create generative AI solutions that respond to prompts that include a mix of text and image data.
4+
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
To generate videos from text prompts, you need to deploy a video generation model. **Sora 2** is an AI model from OpenAI that creates realistic and imaginative video scenes from text instructions, input images, or existing videos. Sora 2 is available in Microsoft Foundry and provides an all-in-one creative platform with superior video quality and intuitive controls.
2+
3+
## Prerequisites
4+
5+
Before deploying a Sora 2 model, ensure you have:
6+
7+
- An Azure subscription
8+
- Access to the Microsoft Foundry portal
9+
- A Foundry project where you have permissions to deploy models
10+
11+
## Deploy the Sora 2 model
12+
13+
To deploy a Sora 2 video generation model in Microsoft Foundry:
14+
15+
1. Go to the [Microsoft Foundry portal](https://ai.azure.com) and sign in with your credentials.
16+
1. From the Foundry landing page, create or select a project.
17+
1. Select **Build** from the navigation pane on the right.
18+
1. Select **Models** from the left-hand menu to view the model catalog.
19+
1. Use the search bar or filter options to find the **Sora-2** video generation model.
20+
1. Select the **Sora-2** model then select **Deploy** and choose the appropriate deployment settings.
21+
22+
> [!TIP]
23+
> To learn more about available models in Microsoft Foundry, see the **[Model catalog and collections in Microsoft Foundry portal](/azure/ai-foundry/how-to/model-catalog-overview)** article in the Microsoft Foundry documentation.
24+
25+
## Sora 2 capabilities
26+
27+
Sora 2 offers several powerful capabilities for video generation:
28+
29+
| Feature | Description |
30+
|---------|-------------|
31+
| **Text to video** | Generate videos from natural language text prompts |
32+
| **Image to video** | Transform existing images into video content |
33+
| **Video remix** | Make targeted adjustments to existing videos without regenerating from scratch |
34+
| **Audio generation** | Supports audio generation in output videos |
35+
| **Multiple resolutions** | Supports portrait (720×1280) and landscape (1280×720) formats |
36+
| **Variable duration** | Generate videos of 4, 8, or 12 seconds |
37+
38+
Sora 2 enables you to create realistic and imaginative video content from text prompts, reference images, or by remixing existing videos. After deploying the model through the Foundry portal, you can use it to generate videos in various resolutions and durations. The model's versatility and ease of use make it a powerful tool for video creation, whether you're starting from scratch or enhancing existing media.
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
Once your Sora 2 model is deployed, you can start generating videos. Video generation is an asynchronous process—you submit a request with your prompt and video settings, then retrieve the completed video when it's ready.
2+
3+
## Video generation parameters
4+
5+
Before crafting your prompt, understand the API parameters that control your video output:
6+
7+
| Parameter | Description | Supported values |
8+
|-----------|-------------|------------------|
9+
| **prompt** | Natural language description of your video | Text string (required) |
10+
| **model** | The model to use | `sora-2` or `sora-2-pro` |
11+
| **size** | Output resolution | `1280x720` (landscape), `720x1280` (portrait) |
12+
| **seconds** | Video duration | `4`, `8`, or `12` (default: 4) |
13+
| **input_reference** | Reference image for the first frame | JPEG, PNG, or WebP file |
14+
| **remix_video_id** | ID of a previous video to remix | Video ID string |
15+
16+
> [!TIP]
17+
> The model follows instructions more reliably in shorter clips. For best results, consider generating two 4-second clips and stitching them together rather than a single 8-second clip.
18+
19+
## Test video generation in the playground
20+
21+
After deploying the Sora 2 model, you can test it using the Video playground in Microsoft Foundry portal:
22+
23+
1. Navigate to your deployed Sora 2 model in the Foundry portal.
24+
1. Select the **Playground** tab to access the video generation interface.
25+
1. Enter your prompt into the text box describing the video you want to generate.
26+
1. Configure video settings such as resolution and duration.
27+
1. Select **Generate** to start video creation.
28+
29+
Video generation typically takes 1 to 5 minutes depending on your settings. When the AI-generated video is ready, it appears on the page.
30+
31+
> [!NOTE]
32+
> The content generation APIs include a content moderation filter. If Azure OpenAI recognizes your prompt as harmful content, it won't return a generated video. For more information, see [Content filtering](/azure/ai-services/openai/concepts/content-filter).
33+
34+
In the video playground, you can also view cURL code samples that are prefilled according to your settings. Select the **View code** button at the top of the playground to access sample code you can use in your applications.
35+
36+
## Writing effective prompts
37+
38+
Think of prompting like briefing a cinematographer. The more specific you are about what the shot should achieve, the more control and consistency you'll get. However, leaving some details open can lead to creative, unexpected results.
39+
40+
### Prompt anatomy
41+
42+
A clear prompt describes a shot as if you were sketching it onto a storyboard:
43+
44+
- **Camera framing**: Specify the shot type (wide, medium, close-up) and angle
45+
- **Subject description**: Anchor your subject with distinctive details
46+
- **Action**: Describe movement in beats—small steps, gestures, or pauses
47+
- **Lighting and palette**: Set the mood with lighting direction and color anchors
48+
- **Style**: Establish the aesthetic early (for example, "1970s film" or "handheld documentary")
49+
50+
### Weak vs. strong prompts
51+
52+
| Weak prompt | Strong prompt |
53+
|-------------|---------------|
54+
| "A beautiful street at night" | "Wet asphalt, zebra crosswalk, neon signs reflecting in puddles" |
55+
| "Person moves quickly" | "Cyclist pedals three times, brakes, and stops at crosswalk" |
56+
| "Cinematic look" | "Anamorphic 2.0x lens, shallow DOF, volumetric light" |
57+
58+
### Example prompt
59+
60+
Here's an example of a well-structured prompt:
61+
62+
```text
63+
In a 90s documentary-style interview, an old Swedish man sits in a study
64+
and says, "I still remember when I was young."
65+
```
66+
67+
This prompt works because:
68+
69+
- "90s documentary" sets the style, so the model chooses appropriate camera, lighting, and color
70+
- "old Swedish man sits in a study" describes subject and setting while allowing creative interpretation
71+
- The dialogue gives the model specific words to sync with the character
72+
73+
## Using reference images
74+
75+
For more control over composition and style, use the `input_reference` parameter to provide a visual reference. The model uses the image as an anchor for the first frame, while your prompt defines what happens next.
76+
77+
Requirements for reference images:
78+
79+
- The image resolution must match the target video size (`1280x720` or `720x1280`)
80+
- Supported formats: JPEG, PNG, WebP
81+
82+
## Remixing existing videos
83+
84+
The remix feature lets you modify specific aspects of an existing video while preserving its core elements—scene transitions, visual layout, and overall structure. This is useful for making targeted adjustments without regenerating from scratch.
85+
86+
To remix a video:
87+
88+
1. Generate a video and note its video ID from the completed job
89+
2. Call the remix endpoint with the original video ID and an updated prompt
90+
3. Describe only the changes you want—keep modifications focused
91+
92+
For best results:
93+
94+
- Limit changes to one clearly articulated adjustment
95+
- Be specific about what to change: "same shot, switch to 85mm lens" or "same lighting, new palette: teal, sand, rust"
96+
- Narrow, precise edits retain greater fidelity to the source material
97+
98+
## Tips for better results
99+
100+
- **Keep it simple**: Each shot should have one clear camera move and one clear subject action
101+
- **Use beats for timing**: Instead of "actor walks across the room," try "actor takes four steps to the window, pauses, and pulls the curtain"
102+
- **Be consistent**: Reuse phrasing for characters across shots to maintain continuity
103+
- **Iterate**: Small changes to camera, lighting, or action can shift outcomes dramatically—treat each generation as a creative variation
104+
105+
Video generation with Sora 2 is a collaborative process. You provide direction, and the model delivers creative variations. Be prepared to experiment—sometimes the second or third generation is the best one.

0 commit comments

Comments
 (0)