Skip to content

Commit 7f8908f

Browse files
Merge pull request #53599 from sherzyang/NEW-get-started-information-extraction
Add new module.
2 parents a218925 + f723cbb commit 7f8908f

24 files changed

Lines changed: 582 additions & 0 deletions
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.get-started-information-extraction.introduction
3+
title: Introduction
4+
metadata:
5+
title: Introduction
6+
description: Introduction
7+
author: sherzyang
8+
ms.author: sheryang
9+
ms.date: 02/21/2026
10+
ms.topic: unit
11+
zone_pivot_groups: video-or-text
12+
durationInMinutes: 4
13+
content: |
14+
[!include[](includes/1-introduction.md)]
15+
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.get-started-information-extraction.documents
3+
title: Extract information from documents
4+
metadata:
5+
title: Extract information from documents
6+
description: Use Azure Vision to extract information from documents.
7+
author: sherzyang
8+
ms.author: sheryang
9+
ms.date: 02/21/2026
10+
ms.topic: unit
11+
zone_pivot_groups: video-or-text
12+
durationInMinutes: 8
13+
content: |
14+
[!include[](includes/2-documents.md)]
15+
16+
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.get-started-information-extraction.audio-video
3+
title: Extract information from audio and video
4+
metadata:
5+
title: Extract information from audio and video.
6+
description: Use Azure to extract information from audio and video.
7+
author: sherzyang
8+
ms.author: sheryang
9+
ms.date: 02/21/2026
10+
ms.topic: unit
11+
zone_pivot_groups: video-or-text
12+
durationInMinutes: 4
13+
content: |
14+
[!include[](includes/3-audio-video.md)]
15+
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.get-started-information-extraction.exercise
3+
title: Exercise - Get started with Content Understanding in Microsoft Foundry
4+
metadata:
5+
title: Exercise - Get started with Content Understanding in Microsoft Foundry
6+
description: Use Microsoft Foundry to extract information from content.
7+
author: sherzyang
8+
ms.author: sheryang
9+
ms.date: 02/21/2026
10+
ms.topic: unit
11+
durationInMinutes: 30
12+
content: |
13+
[!include[](includes/4-exercise.md)]
14+
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.get-started-information-extraction.knowledge-check
3+
title: Module assessment
4+
metadata:
5+
title: Module assessment
6+
description: Knowledge check
7+
author: sherzyang
8+
ms.author: sheryang
9+
ms.date: 02/21/2026
10+
ms.topic: unit
11+
durationInMinutes: 3
12+
quiz:
13+
title: "Check your knowledge"
14+
questions:
15+
- content: "What is the key advantage of using Azure Content Understanding over basic Optical Character Recognition (OCR)?"
16+
choices:
17+
- content: "Azure Content Understanding extracts text faster by skipping image preprocessing."
18+
isCorrect: false
19+
explanation: "Azure Content Understanding doesn't skip preprocessing; it actually combines OCR with other AI techniques like natural language understanding and multimodal models."
20+
- content: "Azure Content Understanding understands document structure and maps extracted data to a defined schema."
21+
isCorrect: true
22+
explanation: "Azure Content Understanding goes beyond basic OCR by using schema‑based extraction, allowing it to identify fields (such as invoice number or total) and map values even when labels vary or are missing."
23+
- content: "Azure Content Understanding extracts structured data, while OCR extracts the relationship between words in text."
24+
isCorrect: false
25+
explanation: "OCR doesn't extract the relationship between words; it simply converts images of text into machine‑readable text."
26+
- content: "What is the primary role of an analyzer in Azure Content Understanding?"
27+
choices:
28+
- content: "It defines how content is processed and what structured data is returned."
29+
isCorrect: true
30+
explanation: "Analyzers are the core components that define how content is processed, including extraction settings, schemas, and model deployments."
31+
- content: "It stores extracted data in a database."
32+
isCorrect: false
33+
explanation: "Analyzers don't store data; they only define extraction and processing behavior"
34+
- content: "It converts JSON output into human‑readable text."
35+
isCorrect: false
36+
explanation: "Analyzers generate structured output (such as JSON), not human‑readable conversions."
37+
- content: "When you use the Azure Content Understanding Python SDK, what happens after you submit content for analysis?"
38+
choices:
39+
- content: "The results are returned immediately in the same request."
40+
isCorrect: false
41+
explanation: "Results aren't returned immediately during asynchronous analysis."
42+
- content: "The analyzer retrains itself on the submitted content."
43+
isCorrect: false
44+
explanation: "Analyzers are predefined or custom‑configured and reused; they aren't retrained per request"
45+
- content: "You must poll a URL until the analysis job completes."
46+
isCorrect: true
47+
explanation: "Content analysis is handled as a long‑running asynchronous operation. After submitting content, you poll the Operation-Location (or use the SDK poller) until the job completes and returns results."
48+
49+
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.get-started-information-extraction.summary
3+
title: Summary
4+
metadata:
5+
title: Summary
6+
description: Summarize what you've learned about information extraction.
7+
author: sherzyang
8+
ms.author: sheryang
9+
ms.date: 02/21/2026
10+
ms.topic: unit
11+
zone_pivot_groups: video-or-text
12+
durationInMinutes: 1
13+
content: |
14+
[!include[](includes/6-summary.md)]
15+
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
::: zone pivot="video"
2+
3+
>[!VIDEO https://learn-video.azurefd.net/vod/player?id=68657ae4-3276-439f-b46a-bccc06c2651c]
4+
5+
::: zone-end
6+
7+
::: zone pivot="text"
8+
9+
Anyone who has manually processed invoices or forms knows how challenging it is. The great news is that we can use AI to eliminate manual effort and build the information systems of the future. AI-powered information extraction and analysis enable organizations to gain actionable insights from data that might otherwise be locked up in documents, images, audio, video, or other assets.
10+
11+
Examples of information extraction scenarios include:
12+
- **Expense claim processing**: A company needs to extract expense descriptions and amounts from scanned receipts.
13+
- **Customer support**: An agency needs to analyze recorded support calls to identify common problems and resolutions.
14+
- **Capacity planning**: A tourist organization needs to estimate visitor volumes by analyzing video footage and images.
15+
16+
**Microsoft Azure Content Understanding in Foundry Tools** uses AI to extract structured information from unstructured content. Azure Content Understanding helps applications understand *what* is in content by identifying entities, fields, relationships, and meaning in the content.
17+
18+
Azure Content Understanding extracts structured data from multiple content types including:
19+
20+
- **Documents & images**: such as PDFs, forms, invoices, receipts, contracts
21+
- **Audio**: such as recordings or calls
22+
- **Video**: such as video of meetings or other media files
23+
24+
Azure Content Understanding's AI-powered information extraction automates the process of turning unstructured content into machine‑readable data that can be searched and analyzed. Next, learn how to extract structured data from unstructured documents and forms.
25+
26+
::: zone-end
27+
28+
> [!NOTE]
29+
> We recognize that different people like to learn in different ways. You can choose to complete this module in video-based format or you can read the content as text and images. The text contains greater detail than the videos, so in some cases you might want to refer to it as supplemental material to the video presentation.
Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
::: zone pivot="video"
2+
3+
>[!VIDEO https://learn-video.azurefd.net/vod/player?id=b86ec81a-5db0-4f23-922c-419ccac09425]
4+
5+
::: zone-end
6+
7+
::: zone pivot="text"
8+
9+
Today's business processes depend heavily on data contained in documents like forms, receipts, and invoices. Manual processing can introduce delays and errors, making data extraction automation more important than ever.
10+
11+
## How Azure Content Understanding works
12+
13+
Azure Content Understanding follows a model‑driven extraction workflow in which unstructured content is ingested, analyzed, and returned as structured data.
14+
15+
1. **Ingest content**: You submit content to Azure Content Understanding.
16+
17+
2. **AI-powered analysis**: The service uses a combination of: Optical Character Recognition (OCR), speech recognition, natural language understanding, and multimodal AI models to analyze the content.
18+
19+
3. **Structured output**: The service returns structured results (for example, in JSON) that match your model—making the data easy to store, search, or integrate into downstream systems.
20+
21+
>[!NOTE]
22+
> JSON (JavaScript Object Notation) is a text‑based data format used to store and exchange structured data between systems. It's easy for humans to read and write, and easy for machines to parse and generate.
23+
24+
#### Understand schemas
25+
26+
OCR (optical character recognition) allows a computer to 'read' text from pictures, such as scanned documents, photos of receipts, or images of printed pages, and turn that text into editable and searchable digital text. Basic OCR helps recognize printed text, focuses on text extraction, and *doesn't* understand meaning, context, or relationships between words.
27+
28+
Azure Content Understanding's document analysis capabilities go beyond simple OCR-based text extraction to include **schema-based** extraction of fields and their values. The schema-driven approach is what differentiates Azure Content Understanding from basic OCR or transcription services.
29+
30+
A schema describes *what information you want to extract* and *how that information should be structured*. When you define a schema, you specify fields to extract. A schema lists the specific fields or entities you care about.
31+
32+
For example, suppose you define a schema that includes the common fields typically found in an invoice, such as:
33+
34+
- Vendor name
35+
- Invoice number
36+
- Invoice date
37+
- Customer name
38+
- Custom address
39+
- Items - the items ordered, each of which includes:
40+
- Item description
41+
- Unit price
42+
- Quantity ordered
43+
- Line item total
44+
- Invoice subtotal
45+
- Tax
46+
- Shipping Charge
47+
- Invoice total
48+
49+
Now suppose you need to extract this information from the following invoice:
50+
51+
![Photograph of an invoice.](../media/invoice.png)
52+
53+
Azure Content Understanding can apply the invoice schema to your invoice and identify the corresponding fields, even when they're labeled with different names (or not labeled at all). The resulting analysis produces a result like this:
54+
55+
![Photograph of an analyzed invoice with detected fields highlighted.](../media/analyzed-invoice.png)
56+
57+
The schema also defines the field structure. Schemas support *structured and nested fields*, not just flat text. For example:
58+
59+
- `Items` is a collection
60+
- Each item has `description`, `unit price`, `quantity`, and `line total`
61+
62+
Identifying structured fields allows Azure Content Understanding to understand relationships between values, something OCR alone cannot do.
63+
64+
In the invoice example, for each detected *field*, you can extract nested values:
65+
66+
- **Vendor name**: Adventure Works Cycles
67+
- **Invoice number**: 1234
68+
- **Invoice date**: 03/07/2025
69+
- **Customer name**: John Smith
70+
- **Custom address**: 123 River Street, Marshtown, England, GL1 234
71+
- **Items**:
72+
- Item 1:
73+
- **Item description**: 38" Racing Bike (Red)
74+
- **Unit price**: 1299.00
75+
- **Quantity ordered**: 1
76+
- **Line item total**: 1299.00
77+
- Item 2:
78+
- **Item description**: Cycling helmet (Black)
79+
- **Unit price**: 25.99
80+
- **Quantity ordered**: 1
81+
- **Line item total**: 25.99
82+
- Item 3:
83+
- **Item description**: Cycling shirt (L)
84+
- **Unit price**: 42.50
85+
- **Quantity ordered**: 2
86+
- **Line item total**: 85.00
87+
- **Invoice subtotal**: 1409.99
88+
- **Tax**: 140.99
89+
- **Shipping Charge**: 35.00
90+
- **Invoice total**: 1585.98
91+
92+
Azure Content Understanding extracts expected meaning, not just labels. Schemas are applied *semantically*, meaning:
93+
- Fields can be extracted even if labels differ
94+
- Fields can be extracted even if labels are missing
95+
96+
For example, *Invoice No.*, *Invoice #*, or an unlabeled number can all map to `InvoiceNumber` if the analyzer determines they represent the same concept.
97+
98+
#### Understand analyzers
99+
100+
An **analyzer** is a unit in Azure Content Understanding that takes input, applies AI analysis, and produces structured results. Analyzers consistently apply the same extraction logic to all incoming content. Once it's configured, an analyzer ensures a schema is reused consistently for every analysis request. Analyzers also produce predictable JSON results. The structured results make downstream processing (storage, search, automation) easier.
101+
102+
Azure Content Understanding offers prebuilt analyzers for common scenarios and supports custom analyzers tailored to your needs. At a high level:
103+
104+
1. You choose or create an analyzer.
105+
2. The analyzer includes a schema defining fields and structure.
106+
3. You submit content for analysis
107+
4. The service applies the schema
108+
5. You receive structured JSON results matching the schema
109+
110+
## Using Azure Content Understanding in the Foundry portal
111+
112+
> [!NOTE]
113+
> Foundry portal has a *classic* user interface (UI) and a *new* user interface.
114+
115+
After you create a *Microsoft Foundry resource*, you can use the ***classic* Foundry portal interface** to test out
116+
Azure Content Understanding. The Foundry portal provides content examples and allows you to upload your own material for analysis.
117+
118+
You can use the visual interface to select a source document and extract default fields of information. For example, when you try out Azure Content Understanding on an image of a document, the service returns the document text and text layout information.
119+
120+
:::image type="content" source="../media/document-analysis-playground.png" alt-text="Screenshot of the classic Foundry portal with a document analyzed with Azure Content Understanding." lightbox="../media/document-analysis-playground.png":::
121+
122+
Azure Content Understanding's analyzers identify text values in documents and map them to specific fields. For example, given an invoice, the service returns the fields (such as Vendor address) and the data in the fields (such as 123 456th Street).
123+
124+
:::image type="content" source="../media/invoice-playground.png" alt-text="Screenshot of the classic Foundry portal with an invoice analyzed with Azure Content Understanding." lightbox="../media/invoice-playground.png":::
125+
126+
In Foundry portal, you can also view the JSON results of the processing.
127+
128+
:::image type="content" source="../media/invoice-json-result-playground.png" alt-text="Screenshot of the classic Foundry portal with the JSON result of an invoice analyzed with Azure Content Understanding." lightbox="../media/invoice-json-result-playground.png":::
129+
130+
## Building a client application with Azure Content Understanding
131+
132+
You can use the **Content Understanding API** to build a lightweight client application that extracts data programmatically.
133+
134+
>[!NOTE]
135+
> A client application is a software program that runs on a user's device and requests services or data from another system, typically a server, over a network. The *client* is the part of an application that users interact with, while the *server* does the heavy work behind the scenes. Applications can request data or actions from a service and receive a structured response using an API.
136+
137+
When you use the Content Understanding API, you can choose a prebuilt analyzer or create a custom analyzer. Prebuilt analyzers include: `prebuilt-invoice`, `prebuilt-imageSearch`, `prebuilt-audioSearch`, and `prebuilt-videoSearch`. When you submit content for analysis to the analyzer, the analysis is **asynchronous**, which means you get the result later when it's ready. Because the analysis is asynchronous, you need to *poll* the Operation-Location URL (or `analyzerResults`) until the job succeeds.
138+
139+
#### Using the Azure Content Understanding Python SDK
140+
141+
Let's take a look at the process of using the Python SDK to analyze an invoice from a URL.
142+
143+
1. Install the Azure Content Understanding Python SDK.
144+
145+
```bash
146+
python -m pip install azure-ai-contentunderstanding
147+
```
148+
149+
2. Identify your Foundry resource endpoint and API key or Microsoft Entra ID. Your endpoint typically looks like: `https://<your-resource-name>.services.ai.azure.com/`
150+
151+
3. Create and run the client application code. The `analzyer_id` is the ID of the prebuilt analyzer. You can find a list of prebuilt analyzer ID values [here](/azure/ai-services/content-understanding/concepts/prebuilt-analyzers).
152+
153+
```python
154+
import os
155+
from azure.ai.contentunderstanding import ContentUnderstandingClient
156+
from azure.core.credentials import AzureKeyCredential
157+
158+
endpoint = os.environ["FOUNDRY_ENDPOINT"]
159+
key = os.environ["FOUNDRY_KEY"]
160+
161+
client = ContentUnderstandingClient(endpoint=endpoint, credential=AzureKeyCredential(key))
162+
163+
# 1) start analysis with analyzer id + inputs
164+
analyzer_id = "prebuilt-invoice"
165+
inputs = [
166+
{"url": "https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/invoice.pdf"}
167+
]
168+
169+
# 2) wait for the Long Running Operation (LRO) to complete
170+
poller = client.begin_analyze(analyzer_id=analyzer_id, inputs=inputs) # starts LRO
171+
result = poller.result() # waits for completion (polling handled by SDK)
172+
173+
# 3) read structured fields + markdown
174+
# The result typically includes extracted "fields" and "markdown" per input content item.
175+
for content in result.contents:
176+
print(content.markdown)
177+
print(content.fields)
178+
```
179+
180+
The resulting output is JSON that shows the extracted markdown, fields, data in the fields, and confidence score. For example:
181+
182+
```json
183+
{
184+
"status": "Succeeded",
185+
"result": {
186+
"analyzerId": "prebuilt-invoice",
187+
"apiVersion": "2025-05-01-preview",
188+
"contents": [
189+
{
190+
"markdown": "# INVOICE\n\nCONTOSO LTD.\n\nContoso Headquarters\n123 456th St\nNew York, NY, 10001\n\nINVOICE: INV-100\n\nINVOICE DATE: 11/15/2019\n\nDUE DATE: 12/15/2019\n\nCUSTOMER NAME: MICROSOFT CORPORATION\n",
191+
"fields": {
192+
"CustomerName": {
193+
"type": "string",
194+
"valueString": "MICROSOFT CORPORATION",
195+
"confidence": 0.95,
196+
},
197+
"InvoiceDate": {
198+
"type": "date",
199+
"valueDate": "2019-11-15",
200+
"confidence": 0.994,
201+
}
202+
}
203+
}
204+
]
205+
}
206+
}
207+
```
208+
209+
Next, learn how to use Azure Content Understanding analyzers to extract structured data from audio and video.
210+
211+
::: zone-end

0 commit comments

Comments
 (0)