Skip to content

Commit 22e5996

Browse files
authored
Merge pull request #53760 from MicrosoftDocs/NEW-image-content-understanding
New module - image content understanding
2 parents 4d97782 + 7759ff2 commit 22e5996

13 files changed

Lines changed: 382 additions & 0 deletions
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-images-with-content-understanding.introduction
3+
title: Introduction
4+
metadata:
5+
title: Introduction
6+
description: "Get started with Content Understanding in Microsoft Foundry."
7+
ms.date: 03/04/2026
8+
author: buzahid
9+
ms.author: buzahid
10+
ms.topic: unit
11+
durationInMinutes: 1
12+
content: |
13+
[!include[](includes/1-introduction.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-images-with-content-understanding.what-is-content-understanding
3+
title: What is Content Understanding?
4+
metadata:
5+
title: What is Content Understanding?
6+
description: "Learn about Azure Content Understanding and how it can analyze images to extract structured data."
7+
ms.date: 03/04/2026
8+
author: buzahid
9+
ms.author: buzahid
10+
ms.topic: unit
11+
durationInMinutes: 3
12+
content: |
13+
[!include[](includes/2-what-is-content-understanding.md)]
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-images-with-content-understanding.analyze-images-with-content-understanding
3+
title: Analyze images with Content Understanding
4+
metadata:
5+
title: Analyze images with Content Understanding
6+
description: "Learn how to analyze images with Azure Content Understanding."
7+
ms.date: 03/04/2026
8+
author: buzahid
9+
ms.author: buzahid
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
content: |
13+
[!include[](includes/3-analyze-images-with-content-understanding.md)]
14+
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-images-with-content-understanding.exercise
3+
title: Exercise - Analyze images with Content Understanding
4+
metadata:
5+
title: Exercise - Analyze images with Content Understanding
6+
description: "Get practical experience with analyzing images using Azure Content Understanding."
7+
ms.date: 03/04/2026
8+
author: buzahid
9+
ms.author: buzahid
10+
ms.topic: unit
11+
durationInMinutes: 30
12+
content: |
13+
[!include[](includes/4-exercise.md)]
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-images-with-content-understanding.knowledge-check
3+
title: Module assessment
4+
metadata:
5+
title: Module assessment
6+
description: "Check your learning on analyzing images with Azure Content Understanding."
7+
ms.date: 03/04/2026
8+
author: buzahid
9+
ms.author: buzahid
10+
ms.topic: unit
11+
durationInMinutes: 3
12+
content: |
13+
quiz:
14+
questions:
15+
- content: "What is the purpose of grounding in Content Understanding?"
16+
choices:
17+
- content: "To connect Content Understanding to Azure storage"
18+
isCorrect: false
19+
explanation: "Incorrect. Grounding identifies where in the content each extracted value was found."
20+
- content: "To identify the specific regions in content where each value was extracted"
21+
isCorrect: true
22+
explanation: "Correct. Grounding allows users to trace extracted values back to their origin in the source content for verification."
23+
- content: "To filter out harmful content from images"
24+
isCorrect: false
25+
explanation: "Incorrect. Content filtering is handled separately by Azure AI Content Safety, not by grounding."
26+
- content: "What does a confidence score of 0.95 indicate for an extracted field?"
27+
choices:
28+
- content: "The extraction failed and needs manual review"
29+
isCorrect: false
30+
explanation: "Incorrect. A score of 0.95 is high confidence, indicating the value can be trusted."
31+
- content: "The value can be trusted for automated processing"
32+
isCorrect: true
33+
explanation: "Correct. High confidence scores (0.9+) indicate accurate data extraction that can be used in automated workflows."
34+
- content: "The field was classified rather than extracted"
35+
isCorrect: false
36+
explanation: "Incorrect. Confidence scores indicate reliability, not the extraction method used."
37+
- content: "Which prebuilt analyzer would you use to extract vendor names and item totals from a purchase receipt?"
38+
choices:
39+
- content: "prebuilt-image"
40+
isCorrect: false
41+
explanation: "Incorrect. While prebuilt-image provides general analysis, prebuilt-receipt is optimized for receipt extraction."
42+
- content: "prebuilt-invoice"
43+
isCorrect: false
44+
explanation: "Incorrect. prebuilt-invoice is designed for invoices, not receipts."
45+
- content: "prebuilt-receipt"
46+
isCorrect: true
47+
explanation: "Correct. The prebuilt-receipt analyzer is optimized to extract vendor names, items, totals, and dates from receipt images."
48+
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.analyze-images-with-content-understanding.summary
3+
title: Summary
4+
metadata:
5+
title: Summary
6+
description: "Reflect on what you've learned about analyzing images with Azure Content Understanding."
7+
ms.date: 03/04/2026
8+
author: buzahid
9+
ms.author: buzahid
10+
ms.topic: unit
11+
durationInMinutes: 1
12+
content: |
13+
[!include[](includes/6-summary.md)]
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Images, documents, and other unstructured content often contain valuable information that's hard to extract automatically. Azure Content Understanding solves this problem by using generative AI to analyze content and return structured data.
2+
3+
With Content Understanding, you define a schema describing the data you want, and the service extracts it from your images and documents. The output is ready to use in automation workflows, analytics, and search applications.
4+
5+
In this module, you'll learn how to analyze images with Content Understanding using both prebuilt and custom analyzers.
6+
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
Azure Content Understanding is a Foundry Tool that uses generative AI to process and extract insights from many types of content, including documents, images, videos, and audio. It transforms unstructured data into structured, actionable output that you can integrate into automation and analytical workflows.
2+
3+
## Why use Content Understanding?
4+
5+
Content Understanding accelerates time to value by enabling straight-through processing of unstructured data. Key benefits include:
6+
7+
- **Simplified workflows**: Standardizes extraction and classification of content from various content types into a unified process
8+
- **Easy field extraction**: Define a schema to extract, classify, or generate field values without complex prompt engineering
9+
- **Enhanced accuracy**: Uses multiple AI models to analyze and cross-validate information simultaneously
10+
- **Confidence scores and grounding**: Ensures accuracy of extracted values while minimizing the cost of human review
11+
- **Content classification**: Categorize document types to streamline processing and route content to appropriate analyzers
12+
13+
## Content Understanding components
14+
15+
The Content Understanding framework processes unstructured content through multiple stages:
16+
17+
| Component | Description |
18+
|-----------|-------------|
19+
| **Inputs** | Source content including documents, images, video, and audio |
20+
| **Analyzer** | Defines how content is processed, including extraction settings and field schema |
21+
| **Content extraction** | Transforms unstructured input into normalized text and metadata using OCR, speech transcription, and layout detection |
22+
| **Field extraction** | Generates structured key-value pairs based on your defined schema |
23+
| **Confidence scores** | Provides reliability estimates from 0 to 1 for each extracted field value |
24+
| **Grounding** | Identifies specific regions in content where each value was extracted |
25+
| **Structured output** | Final result as Markdown for search scenarios or JSON for automation workflows |
26+
27+
## Analyzers
28+
29+
Analyzers are the core component that defines how your content is processed. Content Understanding offers two types:
30+
31+
- **Prebuilt analyzers**: Ready-to-use analyzers designed for common scenarios like invoice processing, receipt extraction, and call center analytics
32+
- **Custom analyzers**: Tailored analyzers you create with your own field schema for specific business needs
33+
34+
When you create an analyzer, you configure:
35+
36+
- The base analyzer type (document, image, audio, or video)
37+
- The AI models to use for processing
38+
- The field schema that defines what data to extract
39+
- Options like confidence scoring and content segmentation
40+
41+
## Use cases
42+
43+
Content Understanding supports many business scenarios:
44+
45+
| Use case | Description |
46+
|----------|-------------|
47+
| **Intelligent document processing** | Convert unstructured documents into structured data for invoice processing, contract analysis, and claims management |
48+
| **Search and RAG** | Ingest multimodal content into search indexes with figure descriptions and layout analysis |
49+
| **Agentic applications** | Transform messy file inputs into predictable, standardized inputs for AI agents |
50+
| **Analytics and reporting** | Extract field outputs to gain insights and make informed decisions |
51+
52+
## Content restrictions
53+
54+
Content Understanding includes built-in Responsible AI protections. The service integrates Azure AI Content Safety to detect and prevent harmful content. When processing content, be aware of these guidelines:
55+
56+
- Content is filtered for harmful material including violence, hate speech, and exploitation
57+
- Face description capabilities can identify facial attributes in video and image content
58+
- Biometric data processing requires appropriate notice and consent from data subjects
59+
60+
With Content Understanding, you can build solutions that extract meaningful insights from diverse content types while maintaining data quality and compliance.
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
Content Understanding can analyze images to extract structured data, identify visual elements, and generate descriptions. You can use prebuilt analyzers for common scenarios or create custom analyzers tailored to your specific needs.
2+
3+
## Supported image formats
4+
5+
Content Understanding supports the following image input types:
6+
7+
| Format | Description |
8+
|--------|-------------|
9+
| **JPEG** | Standard photographic images |
10+
| **PNG** | Images with transparency support |
11+
| **BMP** | Bitmap images |
12+
| **TIFF** | High-quality scanned documents |
13+
| **HEIF** | High-efficiency image format |
14+
| **PDF** | Single or multi-page documents with embedded images |
15+
16+
## Prebuilt image analyzers
17+
18+
Content Understanding includes prebuilt analyzers optimized for common image analysis scenarios:
19+
20+
- **prebuilt-image**: General-purpose image analysis with content extraction and figure description
21+
- **prebuilt-receipt**: Extract vendor names, items, totals, and dates from receipt images
22+
- **prebuilt-invoice**: Extract invoice details including line items, amounts, and vendor information
23+
- **prebuilt-idDocument**: Extract information from identity documents like driver's licenses and passports
24+
25+
## Define a field schema for images
26+
27+
To extract specific information from images, define a field schema that describes the data you want. Each field can use one of three extraction methods:
28+
29+
| Method | Description | Example |
30+
|--------|-------------|---------|
31+
| **extract** | Pull values directly as they appear in the image | Extract text from a label or sign |
32+
| **classify** | Categorize content from predefined options | Classify image as "damaged" or "undamaged" |
33+
| **generate** | Create values based on image analysis | Generate a description of the scene |
34+
35+
Here's an example schema for analyzing product images:
36+
37+
```json
38+
{
39+
"description": "Product image analyzer",
40+
"baseAnalyzerId": "prebuilt-image",
41+
"fieldSchema": {
42+
"fields": {
43+
"ProductName": {
44+
"type": "string",
45+
"method": "extract",
46+
"description": "Name of the product visible in the image"
47+
},
48+
"Condition": {
49+
"type": "string",
50+
"method": "classify",
51+
"description": "Condition of the product",
52+
"enum": ["new", "used", "damaged"]
53+
},
54+
"Description": {
55+
"type": "string",
56+
"method": "generate",
57+
"description": "Brief description of what the image shows"
58+
}
59+
}
60+
}
61+
}
62+
```
63+
64+
## Analyze an image
65+
66+
To analyze an image using Content Understanding, submit a POST request to the analyze endpoint with your analyzer ID and the image URL or file:
67+
68+
```bash
69+
curl -X POST "{endpoint}/contentunderstanding/analyzers/{analyzerId}:analyze?api-version=2025-11-01" \
70+
-H "Ocp-Apim-Subscription-Key: {key}" \
71+
-H "Content-Type: application/json" \
72+
-d '{
73+
"inputs": [
74+
{
75+
"url": "https://example.url/product-image.jpg"
76+
}
77+
]
78+
}'
79+
```
80+
81+
The response includes a result ID that you use to retrieve the analysis results once processing completes.
82+
83+
## Understand the response
84+
85+
When analysis completes, the response includes:
86+
87+
- **markdown**: A text representation of the image content, useful for search and RAG scenarios
88+
- **fields**: Extracted field values matching your schema, each with a confidence score
89+
- **source**: Grounding information showing where in the image each value was found
90+
91+
Example response for a product image:
92+
93+
```json
94+
{
95+
"contents": [
96+
{
97+
"markdown": "Product label showing 'Contoso Widget Pro' with serial number...",
98+
"fields": {
99+
"ProductName": {
100+
"type": "string",
101+
"valueString": "Contoso Widget Pro",
102+
"confidence": 0.95,
103+
"source": "D(1,100,50,300,50,300,80,100,80)"
104+
},
105+
"Condition": {
106+
"type": "string",
107+
"valueString": "new",
108+
"confidence": 0.89
109+
},
110+
"Description": {
111+
"type": "string",
112+
"valueString": "A silver electronic device in retail packaging with product label visible"
113+
}
114+
}
115+
}
116+
]
117+
}
118+
```
119+
120+
## Use confidence scores
121+
122+
Each extracted field includes a confidence score from 0 to 1:
123+
124+
- **High confidence (0.9+)**: Value can be trusted for automated processing
125+
- **Medium confidence (0.7-0.9)**: Consider human review for critical applications
126+
- **Low confidence (<0.7)**: Recommend manual verification
127+
128+
Use confidence scores to build automation workflows that route low-confidence extractions to human reviewers while processing high-confidence results automatically.
129+
130+
## Tips for better image analysis
131+
132+
- **Image quality matters**: Higher resolution images produce more accurate extractions
133+
- **Lighting and contrast**: Ensure text and visual elements are clearly visible
134+
- **Single focus**: Images with one clear subject yield better results than cluttered scenes
135+
- **Consistent orientation**: Upright images are processed more reliably than rotated ones
136+
137+
Content Understanding's image analysis capabilities enable you to transform visual content into structured, actionable data for document processing, inventory management, quality inspection, and many other business scenarios.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
In this exercise, you'll use Azure Content Understanding to analyze images. You start by exploring the prebuilt image analyzer in the Microsoft Foundry portal to see how Content Understanding extracts information from images. Then, you create a Python application that uses the Content Understanding API to analyze images programmatically and extract structured data.
2+
3+
> [!NOTE]
4+
> To complete this exercise, you need an Azure subscription. If you don't have one, you can [sign up for a free account](https://azure.microsoft.com/pricing/purchase-options/azure-account?cid=msft_learn), which includes credits for the first 30 days.
5+
6+
Launch the exercise and follow the instructions.
7+
8+
[![Button to launch exercise.](../media/launch-exercise.png)](https://go.microsoft.com/fwlink/?linkid=2356120&azure-portal=true)

0 commit comments

Comments
 (0)