Skip to content

Commit fb37a1e

Browse files
author
Sherry Yang
committed
Fix for acrolinx.
1 parent a271817 commit fb37a1e

5 files changed

Lines changed: 28 additions & 31 deletions

File tree

learn-pr/wwl-data-ai/get-started-information-extraction/5-knowledge-check.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,32 +16,32 @@ quiz:
1616
choices:
1717
- content: "Azure Content Understanding extracts text faster by skipping image preprocessing."
1818
isCorrect: false
19-
explanation: "Azure Content Understanding does not skip preprocessing; it actually combines OCR with additional AI techniques like natural language understanding and multimodal models."
19+
explanation: "Azure Content Understanding doesn't skip preprocessing; it actually combines OCR with other AI techniques like natural language understanding and multimodal models."
2020
- content: "Azure Content Understanding understands document structure and maps extracted data to a defined schema."
2121
isCorrect: true
2222
explanation: "Azure Content Understanding goes beyond basic OCR by using schema‑based extraction, allowing it to identify fields (such as invoice number or total) and map values even when labels vary or are missing."
2323
- content: "Azure Content Understanding extracts structured data, while OCR extracts the relationship between words in text."
2424
isCorrect: false
25-
explanation: "OCR does not extract the relationship between words; it simply converts images of text into machine‑readable text."
25+
explanation: "OCR doesn't extract the relationship between words; it simply converts images of text into machine‑readable text."
2626
- content: "What is the primary role of an analyzer in Azure Content Understanding?"
2727
choices:
2828
- content: "It defines how content is processed and what structured data is returned."
2929
isCorrect: true
3030
explanation: "Analyzers are the core components that define how content is processed, including extraction settings, schemas, and model deployments."
3131
- content: "It stores extracted data in a database."
3232
isCorrect: false
33-
explanation: "Analyzers do not store data; they only define extraction and processing behavior"
33+
explanation: "Analyzers don't store data; they only define extraction and processing behavior"
3434
- content: "It converts JSON output into human‑readable text."
3535
isCorrect: false
3636
explanation: "Analyzers generate structured output (such as JSON), not human‑readable conversions."
37-
- content: "When using the Azure Content Understanding Python SDK, what happens after you submit content for analysis?"
37+
- content: "When you use the Azure Content Understanding Python SDK, what happens after you submit content for analysis?"
3838
choices:
3939
- content: "The results are returned immediately in the same request."
4040
isCorrect: false
41-
explanation: "Results are not returned immediately during asynchronous analysis."
41+
explanation: "Results aren't returned immediately during asynchronous analysis."
4242
- content: "The analyzer retrains itself on the submitted content."
4343
isCorrect: false
44-
explanation: "Analyzers are predefined or custom‑configured and reused; they are not retrained per request"
44+
explanation: "Analyzers are predefined or custom‑configured and reused; they aren't retrained per request"
4545
- content: "You must poll a URL until the analysis job completes."
4646
isCorrect: true
4747
explanation: "Content analysis is handled as a long‑running asynchronous operation. After submitting content, you poll the Operation-Location (or use the SDK poller) until the job completes and returns results."

learn-pr/wwl-data-ai/get-started-information-extraction/includes/1-introduction.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
::: zone pivot="text"
88

9-
Anyone who has had to manually process invoices or forms knows how challenging it is. The great news is that we can use AI to eliminate manual effort and build the information systems of the future. AI-powered information extraction and analysis enables organizations to gain actionable insights from data that might otherwise be locked up in documents, images, audio files, or other assets.
9+
Anyone who has manually processed invoices or forms knows how challenging it is. The great news is that we can use AI to eliminate manual effort and build the information systems of the future. AI-powered information extraction and analysis enable organizations to gain actionable insights from data that might otherwise be locked up in documents, images, audio, video, or other assets.
1010

1111
Examples of information extraction scenarios include:
1212
- **Expense claim processing**: A company needs to extract expense descriptions and amounts from scanned receipts.
@@ -17,9 +17,9 @@ Examples of information extraction scenarios include:
1717

1818
Azure Content Understanding extracts structured data from multiple content types including:
1919

20-
- **Documents & images:**: PDFs, forms, invoices, receipts, contracts
21-
- **Audio:**: recordings, calls
22-
- **Video:**: meetings, media files
20+
- **Documents & images:**: such as PDFs, forms, invoices, receipts, contracts
21+
- **Audio:**: such as recordings or calls
22+
- **Video:**: such as video of meetings or other media files
2323

2424
Azure Content Understanding's AI-powered information extraction automates the process of turning unstructured content into machine‑readable data that can be searched and analyzed. Next, learn how to extract structured data from unstructured documents and forms.
2525

learn-pr/wwl-data-ai/get-started-information-extraction/includes/2-documents.md

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,22 +14,21 @@ Azure Content Understanding follows a model‑driven extraction workflow in whic
1414

1515
1. **Ingest content**: You submit content to Azure Content Understanding.
1616

17-
2. **AI-powered analysis**: The service uses a combination of: OCR, speech recognition, natural language understanding, and multimodal AI models to analyze the content.
17+
2. **AI-powered analysis**: The service uses a combination of: Optical Character Recognition (OCR), speech recognition, natural language understanding, and multimodal AI models to analyze the content.
1818

1919
3. **Structured output**: The service returns structured results (for example, in JSON) that match your model—making the data easy to store, search, or integrate into downstream systems.
2020

2121
>[!NOTE]
22-
> JSON (JavaScript Object Notation) is a text‑based data format used to store and exchange structured data between systems. It is easy for humans to read and write, and easy for machines to parse and generate.
22+
> JSON (JavaScript Object Notation) is a text‑based data format used to store and exchange structured data between systems. It's easy for humans to read and write, and easy for machines to parse and generate.
2323
2424
#### Understand schemas
2525

26-
**Optical Character Recognition (OCR)** allows a computer to 'read' text from pictures, such as scanned documents, photos of receipts, or images of printed pages, and turn that text into editable and searchable digital text. Basic OCR helps recognize printed text, focuses on text extraction, and *does not* understand meaning, context, or relationships between words.
26+
OCR (optical character recognition) allows a computer to 'read' text from pictures, such as scanned documents, photos of receipts, or images of printed pages, and turn that text into editable and searchable digital text. Basic OCR helps recognize printed text, focuses on text extraction, and *doesn't* understand meaning, context, or relationships between words.
2727

2828
Azure Content Understanding's document analysis capabilities go beyond simple OCR-based text extraction to include **schema-based** extraction of fields and their values. The schema-driven approach is what differentiates Azure Content Understanding from basic OCR or transcription services.
2929

30-
In Azure Content Understanding, a schema describes *what information you want to extract* and *how that information should be structured* when unstructured content (documents, audio, images, or video) is analyzed.
30+
A schema describes *what information you want to extract* and *how that information should be structured*. When you define a schema, you specify fields to extract. A schema lists the specific fields or entities you care about.
3131

32-
When you define a schema, you specify fields to extract. A schema lists the specific fields or entities you care about.
3332
For example, suppose you define a schema that includes the common fields typically found in an invoice, such as:
3433

3534
- Vendor name
@@ -55,12 +54,12 @@ Azure Content Understanding can apply the invoice schema to your invoice and ide
5554

5655
![Photograph of an analyzed invoice with detected fields highlighted.](../media/analyzed-invoice.png)
5756

58-
A schema also defines the field structure. Schemas support *structured and nested fields*, not just flat text. For example:
57+
The schema also defines the field structure. Schemas support *structured and nested fields*, not just flat text. For example:
5958

6059
- `Items` is a collection
6160
- Each item has `description`, `unit price`, `quantity`, and `line total`
6261

63-
Identifying structured fields allows Azure Content Understanding to understand relationships between values, such as which price belongs to which item—something OCR alone cannot do.
62+
Identifying structured fields allows Azure Content Understanding to understand relationships between values, something OCR alone cannot do.
6463

6564
In the invoice example, for each detected *field*, you can extract nested values:
6665

@@ -94,11 +93,11 @@ Azure Content Understanding extracts expected meaning, not just labels. Schemas
9493
- Fields can be extracted even if labels differ
9594
- Fields can be extracted even if labels are missing
9695

97-
For example, Invoice No.”, “Invoice #, or an unlabeled number can all map to `InvoiceNumber` if the analyzer determines they represent the same concept.
96+
For example, *Invoice No.*, *Invoice #*, or an unlabeled number can all map to `InvoiceNumber` if the analyzer determines they represent the same concept.
9897

9998
#### Understand analyzers
10099

101-
An **analyzer** is a unit in Azure Content Understanding that takes input, applies AI analysis, and produces structured results. Analyzers consistently apply the same extraction logic to all incoming content. Once it is configured, an analyzer ensures a schema is reused consistently for every analysis request. Analyzers also produce predictable JSON results which makes downstream processing (storage, search, automation) easier.
100+
An **analyzer** is a unit in Azure Content Understanding that takes input, applies AI analysis, and produces structured results. Analyzers consistently apply the same extraction logic to all incoming content. Once it's configured, an analyzer ensures a schema is reused consistently for every analysis request. Analyzers also produce predictable JSON results. The structured results make downstream processing (storage, search, automation) easier.
102101

103102
Azure Content Understanding offers prebuilt analyzers for common scenarios and supports custom analyzers tailored to your needs. At a high level:
104103

@@ -133,7 +132,7 @@ In Foundry portal, you can also view the JSON results of the processing.
133132
You can use the **Content Understanding API** to build a lightweight client application that extracts data programmatically.
134133

135134
>[!NOTE]
136-
> A client application is a software program that runs on a user's device and requests services or data from another system, typically a server, over a network. The *client* is the part of an application that users interact with, while the *server* does the heavy work behind the scenes. An API lets applications request data or actions from a service and receive a structured response, without needing to know how the service is built internally.
135+
> A client application is a software program that runs on a user's device and requests services or data from another system, typically a server, over a network. The *client* is the part of an application that users interact with, while the *server* does the heavy work behind the scenes. Applications can request data or actions from a service and receive a structured response using an API.
137136
138137
When you use the Content Understanding API, you can choose a prebuilt analyzer or create a custom analyzer. Prebuilt analyzers include: `prebuilt-invoice`, `prebuilt-imageSearch`, `prebuilt-audioSearch`, and `prebuilt-videoSearch`. When you submit content for analysis to the analyzer, the analysis is **asynchronous**, which means you get the result later when it's ready. Because the analysis is asynchronous, you need to *poll* the Operation-Location URL (or `analyzerResults`) until the job succeeds.
139138

@@ -149,7 +148,7 @@ python -m pip install azure-ai-contentunderstanding
149148

150149
2. Identify your Foundry resource endpoint and API key or Microsoft Entra ID. Your endpoint typically looks like: `https://<your-resource-name>.services.ai.azure.com/`
151150

152-
3. Create and run the client application code. The `analzyer_id` is the ID of the prebuilt analyzer. You can find a list of prebuilt analyzer IDs [here](/azure/ai-services/content-understanding/concepts/prebuilt-analyzers).
151+
3. Create and run the client application code. The `analzyer_id` is the ID of the prebuilt analyzer. You can find a list of prebuilt analyzer ID values [here](/azure/ai-services/content-understanding/concepts/prebuilt-analyzers).
153152

154153
```python
155154
import os
@@ -207,6 +206,6 @@ The resulting output is JSON that shows the extracted markdown, fields, data in
207206
}
208207
```
209208

210-
Content Understanding includes multiple predefined analyzers for common document types, as well as virtual tools to help you create custom analyzers for your particular data processing scenarios. Next, learn how to use Azure Content Understanding analyzers to extract structured data from audio and video.
209+
Next, learn how to use Azure Content Understanding analyzers to extract structured data from audio and video.
211210

212211
::: zone-end

learn-pr/wwl-data-ai/get-started-information-extraction/includes/3-audio-video.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,7 @@
66

77
::: zone pivot="text"
88

9-
As well as traditional documents and forms, business information is increasingly found in multimedia formats such as audio and video files. For example, businesses often record calls in order to analyze them later. The growth of video conferencing means that a lot of useful information is captured in recorded meetings.
10-
11-
Azure Content Understanding includes analyzers for audio and video content that you can use to extract insights that would otherwise be locked up in media files.
9+
Business information is increasingly found in multimedia formats such as audio and video files. For example, businesses often record calls in order to analyze them later. The growth of video conferencing means that useful information is often captured in recorded meetings. Azure Content Understanding supports both audio and video data extraction and analysis.
1210

1311
## Extracting structured data from audio
1412

@@ -39,7 +37,7 @@ Thanks, bye!
3937
Using Azure Content Understanding to analyze the audio recording and apply your schema produces the following results:
4038

4139
- **Caller**: Ava from Contoso
42-
- **Message summary**: Ava from Contoso called to follow up on a meeting and mentioned that they can meet the price expectations. She requested a callback or an email to discuss next steps.
40+
- **Message summary**: Ava from Contoso called to follow up on a meeting and mentioned that they can meet the price expectations. They requested a callback or an email to discuss next steps.
4341
- **Requested actions**: Call back or send an email to discuss next steps.
4442
- **Callback number**: 555-12345
4543
- **Alternative contact details**: [email protected]
@@ -58,7 +56,7 @@ Let's take a look at how we can use content understanding to analyze a call reco
5856

5957
:::image type="content" source="../media/audio-extraction-playground.png" alt-text="Screenshot of the classic Foundry portal with audio analyzed with Azure Content Understanding." lightbox="../media/audio-extraction-playground.png":::
6058

61-
In the returned results, you can see specific information that has been identified in the call. As with other analyzers in content understanding, the results are in JSON format for further processing.
59+
In the returned results, you can see specific information from the call. As with other analyzers in content understanding, the results are in JSON format for further processing.
6260

6361
:::image type="content" source="../media/audio-json-result.png" alt-text="Screenshot of the classic Foundry portal where audio is analyzed and JSON is returned." lightbox="../media/audio-json-result.png":::
6462

@@ -73,11 +71,11 @@ Let's first look at one image from the conference room camera. Suppose you defin
7371
- Remote attendees
7472
- Total attendees
7573

76-
You can use Azure Content Understanding to analyze the following from the conference room camera:
74+
You could use Azure Content Understanding to analyze an image from the conference room camera:
7775

7876
![Photograph of a person in a conference room on a call with three remote attendees.](../media/conference-call.jpg)
7977

80-
When applying the preceding schema to this image, Azure Content Understanding produces the following results:
78+
After applying the schema to the image, Azure Content Understanding returned structured data:
8179

8280
- **Location**: Conference room
8381
- **In-person attendees**: 1

learn-pr/wwl-data-ai/get-started-information-extraction/index.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
uid: learn.wwl.get-started-information-extraction
33
metadata:
44
title: Get started with AI-powered information extraction in Azure
5-
description: AI gives you the power to unlock insights from your data. In this module, you'll learn how to use Foundry Tools to extract information from content.
5+
description: AI gives you the power to unlock insights from your data. Learn how to use Azure Content Understanding in Foundry Tools to extract information from content.
66
author: sherzyang
77
ms.author: sheryang
88
ms.date: 02/21/2026
@@ -12,7 +12,7 @@ metadata:
1212
ms.collection:
1313
- wwl-ai-copilot
1414
title: Get started with AI-powered information extraction in Azure
15-
summary: AI gives you the power to unlock insights from your data. In this module, you'll learn how to use Foundry Tools to extract information from content.
15+
summary: AI gives you the power to unlock insights from your data. Learn how to use Azure Content Understanding in Foundry Tools to extract information from content.
1616
abstract: |
1717
After completing this module, you'll be able to:
1818
- Identify Foundry Tools for information extraction

0 commit comments

Comments
 (0)