You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/get-started-information-extraction/5-knowledge-check.yml
+6-6Lines changed: 6 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -16,32 +16,32 @@ quiz:
16
16
choices:
17
17
- content: "Azure Content Understanding extracts text faster by skipping image preprocessing."
18
18
isCorrect: false
19
-
explanation: "Azure Content Understanding does not skip preprocessing; it actually combines OCR with additional AI techniques like natural language understanding and multimodal models."
19
+
explanation: "Azure Content Understanding doesn't skip preprocessing; it actually combines OCR with other AI techniques like natural language understanding and multimodal models."
20
20
- content: "Azure Content Understanding understands document structure and maps extracted data to a defined schema."
21
21
isCorrect: true
22
22
explanation: "Azure Content Understanding goes beyond basic OCR by using schema‑based extraction, allowing it to identify fields (such as invoice number or total) and map values even when labels vary or are missing."
23
23
- content: "Azure Content Understanding extracts structured data, while OCR extracts the relationship between words in text."
24
24
isCorrect: false
25
-
explanation: "OCR does not extract the relationship between words; it simply converts images of text into machine‑readable text."
25
+
explanation: "OCR doesn't extract the relationship between words; it simply converts images of text into machine‑readable text."
26
26
- content: "What is the primary role of an analyzer in Azure Content Understanding?"
27
27
choices:
28
28
- content: "It defines how content is processed and what structured data is returned."
29
29
isCorrect: true
30
30
explanation: "Analyzers are the core components that define how content is processed, including extraction settings, schemas, and model deployments."
31
31
- content: "It stores extracted data in a database."
32
32
isCorrect: false
33
-
explanation: "Analyzers do not store data; they only define extraction and processing behavior"
33
+
explanation: "Analyzers don't store data; they only define extraction and processing behavior"
34
34
- content: "It converts JSON output into human‑readable text."
35
35
isCorrect: false
36
36
explanation: "Analyzers generate structured output (such as JSON), not human‑readable conversions."
37
-
- content: "When using the Azure Content Understanding Python SDK, what happens after you submit content for analysis?"
37
+
- content: "When you use the Azure Content Understanding Python SDK, what happens after you submit content for analysis?"
38
38
choices:
39
39
- content: "The results are returned immediately in the same request."
40
40
isCorrect: false
41
-
explanation: "Results are not returned immediately during asynchronous analysis."
41
+
explanation: "Results aren't returned immediately during asynchronous analysis."
42
42
- content: "The analyzer retrains itself on the submitted content."
43
43
isCorrect: false
44
-
explanation: "Analyzers are predefined or custom‑configured and reused; they are not retrained per request"
44
+
explanation: "Analyzers are predefined or custom‑configured and reused; they aren't retrained per request"
45
45
- content: "You must poll a URL until the analysis job completes."
46
46
isCorrect: true
47
47
explanation: "Content analysis is handled as a long‑running asynchronous operation. After submitting content, you poll the Operation-Location (or use the SDK poller) until the job completes and returns results."
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/get-started-information-extraction/includes/1-introduction.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@
6
6
7
7
::: zone pivot="text"
8
8
9
-
Anyone who has had to manually process invoices or forms knows how challenging it is. The great news is that we can use AI to eliminate manual effort and build the information systems of the future. AI-powered information extraction and analysis enables organizations to gain actionable insights from data that might otherwise be locked up in documents, images, audio files, or other assets.
9
+
Anyone who has manually processed invoices or forms knows how challenging it is. The great news is that we can use AI to eliminate manual effort and build the information systems of the future. AI-powered information extraction and analysis enable organizations to gain actionable insights from data that might otherwise be locked up in documents, images, audio, video, or other assets.
10
10
11
11
Examples of information extraction scenarios include:
12
12
-**Expense claim processing**: A company needs to extract expense descriptions and amounts from scanned receipts.
@@ -17,9 +17,9 @@ Examples of information extraction scenarios include:
17
17
18
18
Azure Content Understanding extracts structured data from multiple content types including:
-**Documents & images:**: such as PDFs, forms, invoices, receipts, contracts
21
+
-**Audio:**: such as recordings or calls
22
+
-**Video:**: such as video of meetings or other media files
23
23
24
24
Azure Content Understanding's AI-powered information extraction automates the process of turning unstructured content into machine‑readable data that can be searched and analyzed. Next, learn how to extract structured data from unstructured documents and forms.
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/get-started-information-extraction/includes/2-documents.md
+11-12Lines changed: 11 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,22 +14,21 @@ Azure Content Understanding follows a model‑driven extraction workflow in whic
14
14
15
15
1.**Ingest content**: You submit content to Azure Content Understanding.
16
16
17
-
2.**AI-powered analysis**: The service uses a combination of: OCR, speech recognition, natural language understanding, and multimodal AI models to analyze the content.
17
+
2.**AI-powered analysis**: The service uses a combination of: Optical Character Recognition (OCR), speech recognition, natural language understanding, and multimodal AI models to analyze the content.
18
18
19
19
3.**Structured output**: The service returns structured results (for example, in JSON) that match your model—making the data easy to store, search, or integrate into downstream systems.
20
20
21
21
>[!NOTE]
22
-
> JSON (JavaScript Object Notation) is a text‑based data format used to store and exchange structured data between systems. It is easy for humans to read and write, and easy for machines to parse and generate.
22
+
> JSON (JavaScript Object Notation) is a text‑based data format used to store and exchange structured data between systems. It's easy for humans to read and write, and easy for machines to parse and generate.
23
23
24
24
#### Understand schemas
25
25
26
-
**Optical Character Recognition (OCR)** allows a computer to 'read' text from pictures, such as scanned documents, photos of receipts, or images of printed pages, and turn that text into editable and searchable digital text. Basic OCR helps recognize printed text, focuses on text extraction, and *does not* understand meaning, context, or relationships between words.
26
+
OCR (optical character recognition) allows a computer to 'read' text from pictures, such as scanned documents, photos of receipts, or images of printed pages, and turn that text into editable and searchable digital text. Basic OCR helps recognize printed text, focuses on text extraction, and *doesn't* understand meaning, context, or relationships between words.
27
27
28
28
Azure Content Understanding's document analysis capabilities go beyond simple OCR-based text extraction to include **schema-based** extraction of fields and their values. The schema-driven approach is what differentiates Azure Content Understanding from basic OCR or transcription services.
29
29
30
-
In Azure Content Understanding, a schema describes *what information you want to extract* and *how that information should be structured* when unstructured content (documents, audio, images, or video) is analyzed.
30
+
A schema describes *what information you want to extract* and *how that information should be structured*. When you define a schema, you specify fields to extract. A schema lists the specific fields or entities you care about.
31
31
32
-
When you define a schema, you specify fields to extract. A schema lists the specific fields or entities you care about.
33
32
For example, suppose you define a schema that includes the common fields typically found in an invoice, such as:
34
33
35
34
- Vendor name
@@ -55,12 +54,12 @@ Azure Content Understanding can apply the invoice schema to your invoice and ide
55
54
56
55

57
56
58
-
A schema also defines the field structure. Schemas support *structured and nested fields*, not just flat text. For example:
57
+
The schema also defines the field structure. Schemas support *structured and nested fields*, not just flat text. For example:
59
58
60
59
-`Items` is a collection
61
60
- Each item has `description`, `unit price`, `quantity`, and `line total`
62
61
63
-
Identifying structured fields allows Azure Content Understanding to understand relationships between values, such as which price belongs to which item—something OCR alone cannot do.
62
+
Identifying structured fields allows Azure Content Understanding to understand relationships between values, something OCR alone cannot do.
64
63
65
64
In the invoice example, for each detected *field*, you can extract nested values:
66
65
@@ -94,11 +93,11 @@ Azure Content Understanding extracts expected meaning, not just labels. Schemas
94
93
- Fields can be extracted even if labels differ
95
94
- Fields can be extracted even if labels are missing
96
95
97
-
For example, “Invoice No.”, “Invoice #”, or an unlabeled number can all map to `InvoiceNumber` if the analyzer determines they represent the same concept.
96
+
For example, *Invoice No.*, *Invoice #*, or an unlabeled number can all map to `InvoiceNumber` if the analyzer determines they represent the same concept.
98
97
99
98
#### Understand analyzers
100
99
101
-
An **analyzer** is a unit in Azure Content Understanding that takes input, applies AI analysis, and produces structured results. Analyzers consistently apply the same extraction logic to all incoming content. Once it is configured, an analyzer ensures a schema is reused consistently for every analysis request. Analyzers also produce predictable JSON results which makes downstream processing (storage, search, automation) easier.
100
+
An **analyzer** is a unit in Azure Content Understanding that takes input, applies AI analysis, and produces structured results. Analyzers consistently apply the same extraction logic to all incoming content. Once it's configured, an analyzer ensures a schema is reused consistently for every analysis request. Analyzers also produce predictable JSON results. The structured results make downstream processing (storage, search, automation) easier.
102
101
103
102
Azure Content Understanding offers prebuilt analyzers for common scenarios and supports custom analyzers tailored to your needs. At a high level:
104
103
@@ -133,7 +132,7 @@ In Foundry portal, you can also view the JSON results of the processing.
133
132
You can use the **Content Understanding API** to build a lightweight client application that extracts data programmatically.
134
133
135
134
>[!NOTE]
136
-
> A client application is a software program that runs on a user's device and requests services or data from another system, typically a server, over a network. The *client* is the part of an application that users interact with, while the *server* does the heavy work behind the scenes. An API lets applications request data or actions from a service and receive a structured response, without needing to know how the service is built internally.
135
+
> A client application is a software program that runs on a user's device and requests services or data from another system, typically a server, over a network. The *client* is the part of an application that users interact with, while the *server* does the heavy work behind the scenes. Applications can request data or actions from a service and receive a structured response using an API.
137
136
138
137
When you use the Content Understanding API, you can choose a prebuilt analyzer or create a custom analyzer. Prebuilt analyzers include: `prebuilt-invoice`, `prebuilt-imageSearch`, `prebuilt-audioSearch`, and `prebuilt-videoSearch`. When you submit content for analysis to the analyzer, the analysis is **asynchronous**, which means you get the result later when it's ready. Because the analysis is asynchronous, you need to *poll* the Operation-Location URL (or `analyzerResults`) until the job succeeds.
2. Identify your Foundry resource endpoint and API key or Microsoft Entra ID. Your endpoint typically looks like: `https://<your-resource-name>.services.ai.azure.com/`
151
150
152
-
3. Create and run the client application code. The `analzyer_id` is the ID of the prebuilt analyzer. You can find a list of prebuilt analyzer IDs[here](/azure/ai-services/content-understanding/concepts/prebuilt-analyzers).
151
+
3. Create and run the client application code. The `analzyer_id` is the ID of the prebuilt analyzer. You can find a list of prebuilt analyzer ID values[here](/azure/ai-services/content-understanding/concepts/prebuilt-analyzers).
153
152
154
153
```python
155
154
import os
@@ -207,6 +206,6 @@ The resulting output is JSON that shows the extracted markdown, fields, data in
207
206
}
208
207
```
209
208
210
-
Content Understanding includes multiple predefined analyzers for common document types, as well as virtual tools to help you create custom analyzers for your particular data processing scenarios. Next, learn how to use Azure Content Understanding analyzers to extract structured data from audio and video.
209
+
Next, learn how to use Azure Content Understanding analyzers to extract structured data from audio and video.
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/get-started-information-extraction/includes/3-audio-video.md
+5-7Lines changed: 5 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,9 +6,7 @@
6
6
7
7
::: zone pivot="text"
8
8
9
-
As well as traditional documents and forms, business information is increasingly found in multimedia formats such as audio and video files. For example, businesses often record calls in order to analyze them later. The growth of video conferencing means that a lot of useful information is captured in recorded meetings.
10
-
11
-
Azure Content Understanding includes analyzers for audio and video content that you can use to extract insights that would otherwise be locked up in media files.
9
+
Business information is increasingly found in multimedia formats such as audio and video files. For example, businesses often record calls in order to analyze them later. The growth of video conferencing means that useful information is often captured in recorded meetings. Azure Content Understanding supports both audio and video data extraction and analysis.
12
10
13
11
## Extracting structured data from audio
14
12
@@ -39,7 +37,7 @@ Thanks, bye!
39
37
Using Azure Content Understanding to analyze the audio recording and apply your schema produces the following results:
40
38
41
39
-**Caller**: Ava from Contoso
42
-
-**Message summary**: Ava from Contoso called to follow up on a meeting and mentioned that they can meet the price expectations. She requested a callback or an email to discuss next steps.
40
+
-**Message summary**: Ava from Contoso called to follow up on a meeting and mentioned that they can meet the price expectations. They requested a callback or an email to discuss next steps.
43
41
-**Requested actions**: Call back or send an email to discuss next steps.
@@ -58,7 +56,7 @@ Let's take a look at how we can use content understanding to analyze a call reco
58
56
59
57
:::image type="content" source="../media/audio-extraction-playground.png" alt-text="Screenshot of the classic Foundry portal with audio analyzed with Azure Content Understanding." lightbox="../media/audio-extraction-playground.png":::
60
58
61
-
In the returned results, you can see specific information that has been identified in the call. As with other analyzers in content understanding, the results are in JSON format for further processing.
59
+
In the returned results, you can see specific information from the call. As with other analyzers in content understanding, the results are in JSON format for further processing.
62
60
63
61
:::image type="content" source="../media/audio-json-result.png" alt-text="Screenshot of the classic Foundry portal where audio is analyzed and JSON is returned." lightbox="../media/audio-json-result.png":::
64
62
@@ -73,11 +71,11 @@ Let's first look at one image from the conference room camera. Suppose you defin
73
71
- Remote attendees
74
72
- Total attendees
75
73
76
-
You can use Azure Content Understanding to analyze the following from the conference room camera:
74
+
You could use Azure Content Understanding to analyze an image from the conference room camera:
77
75
78
76

79
77
80
-
When applying the preceding schema to this image, Azure Content Understanding produces the following results:
78
+
After applying the schema to the image, Azure Content Understanding returned structured data:
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/get-started-information-extraction/index.yml
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
uid: learn.wwl.get-started-information-extraction
3
3
metadata:
4
4
title: Get started with AI-powered information extraction in Azure
5
-
description: AI gives you the power to unlock insights from your data. In this module, you'll learn how to use Foundry Tools to extract information from content.
5
+
description: AI gives you the power to unlock insights from your data. Learn how to use Azure Content Understanding in Foundry Tools to extract information from content.
6
6
author: sherzyang
7
7
ms.author: sheryang
8
8
ms.date: 02/21/2026
@@ -12,7 +12,7 @@ metadata:
12
12
ms.collection:
13
13
- wwl-ai-copilot
14
14
title: Get started with AI-powered information extraction in Azure
15
-
summary: AI gives you the power to unlock insights from your data. In this module, you'll learn how to use Foundry Tools to extract information from content.
15
+
summary: AI gives you the power to unlock insights from your data. Learn how to use Azure Content Understanding in Foundry Tools to extract information from content.
16
16
abstract: |
17
17
After completing this module, you'll be able to:
18
18
- Identify Foundry Tools for information extraction
0 commit comments