MicrosoftDocs
diff --git a/‎learn-pr/wwl-data-ai/get-started-information-extraction/1-introduction.yml‎
Lines changed: 15 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-information-extraction/1-introduction.yml‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-information-extraction/2-documents.yml‎
Lines changed: 16 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-information-extraction/2-documents.yml‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-information-extraction/3-audio-video.yml‎
Lines changed: 15 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-information-extraction/3-audio-video.yml‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-information-extraction/4-exercise.yml‎
Lines changed: 14 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-information-extraction/4-exercise.yml‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-information-extraction/5-knowledge-check.yml‎
Lines changed: 49 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-information-extraction/5-knowledge-check.yml‎
Lines changed: 49 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-information-extraction/6-summary.yml‎
Lines changed: 15 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-information-extraction/6-summary.yml‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-information-extraction/includes/1-introduction.md‎
Lines changed: 29 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-information-extraction/includes/1-introduction.md‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/get-started-information-extraction/includes/2-documents.md‎
Lines changed: 211 additions & 0 deletions b/‎learn-pr/wwl-data-ai/get-started-information-extraction/includes/2-documents.md‎
Lines changed: 211 additions & 0 deletions
@@ -0,0 +1,15 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.get-started-information-extraction.introduction
+title: Introduction
+metadata:
+  title: Introduction
+  description: Introduction
+  author: sherzyang
+  ms.author: sheryang
+  ms.date: 02/21/2026
+  ms.topic: unit
+  zone_pivot_groups: video-or-text
+durationInMinutes: 4
+content: |
+  [!include[](includes/1-introduction.md)]
+
@@ -0,0 +1,16 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.get-started-information-extraction.documents
+title: Extract information from documents
+metadata:
+  title: Extract information from documents
+  description: Use Azure Vision to extract information from documents.
+  author: sherzyang
+  ms.author: sheryang
+  ms.date: 02/21/2026
+  ms.topic: unit
+  zone_pivot_groups: video-or-text
+durationInMinutes: 8
+content: |
+  [!include[](includes/2-documents.md)]
+
+
@@ -0,0 +1,15 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.get-started-information-extraction.audio-video
+title: Extract information from audio and video 
+metadata:
+  title: Extract information from audio and video.
+  description: Use Azure to extract information from audio and video.
+  author: sherzyang
+  ms.author: sheryang
+  ms.date: 02/21/2026
+  ms.topic: unit
+  zone_pivot_groups: video-or-text
+durationInMinutes: 4
+content: |
+  [!include[](includes/3-audio-video.md)]
+
@@ -0,0 +1,14 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.get-started-information-extraction.exercise
+title: Exercise - Get started with Content Understanding in Microsoft Foundry
+metadata:
+  title: Exercise - Get started with Content Understanding in Microsoft Foundry
+  description: Use Microsoft Foundry to extract information from content.
+  author: sherzyang
+  ms.author: sheryang
+  ms.date: 02/21/2026
+  ms.topic: unit
+durationInMinutes: 30
+content: |
+  [!include[](includes/4-exercise.md)]
+
@@ -0,0 +1,49 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.get-started-information-extraction.knowledge-check
+title: Module assessment
+metadata:
+  title: Module assessment
+  description: Knowledge check
+  author: sherzyang
+  ms.author: sheryang
+  ms.date: 02/21/2026
+  ms.topic: unit
+durationInMinutes: 3
+quiz:
+  title: "Check your knowledge"
+  questions:
+  - content: "What is the key advantage of using Azure Content Understanding over basic Optical Character Recognition (OCR)?"
+    choices:
+    - content: "Azure Content Understanding extracts text faster by skipping image preprocessing."
+      isCorrect: false
+      explanation: "Azure Content Understanding doesn't skip preprocessing; it actually combines OCR with other AI techniques like natural language understanding and multimodal models."
+    - content: "Azure Content Understanding understands document structure and maps extracted data to a defined schema."
+      isCorrect: true
+      explanation: "Azure Content Understanding goes beyond basic OCR by using schema‑based extraction, allowing it to identify fields (such as invoice number or total) and map values even when labels vary or are missing."
+    - content: "Azure Content Understanding extracts structured data, while OCR extracts the relationship between words in text."
+      isCorrect: false
+      explanation: "OCR doesn't extract the relationship between words; it simply converts images of text into machine‑readable text."
+  - content: "What is the primary role of an analyzer in Azure Content Understanding?"
+    choices:
+    - content: "It defines how content is processed and what structured data is returned."
+      isCorrect: true
+      explanation: "Analyzers are the core components that define how content is processed, including extraction settings, schemas, and model deployments."
+    - content: "It stores extracted data in a database."
+      isCorrect: false
+      explanation: "Analyzers don't store data; they only define extraction and processing behavior"
+    - content: "It converts JSON output into human‑readable text."
+      isCorrect: false
+      explanation: "Analyzers generate structured output (such as JSON), not human‑readable conversions."
+  - content: "When you use the Azure Content Understanding Python SDK, what happens after you submit content for analysis?"
+    choices:
+    - content: "The results are returned immediately in the same request."
+      isCorrect: false
+      explanation: "Results aren't returned immediately during asynchronous analysis."
+    - content: "The analyzer retrains itself on the submitted content."
+      isCorrect: false
+      explanation: "Analyzers are predefined or custom‑configured and reused; they aren't retrained per request"
+    - content: "You must poll a URL until the analysis job completes."
+      isCorrect: true
+      explanation: "Content analysis is handled as a long‑running asynchronous operation. After submitting content, you poll the Operation-Location (or use the SDK poller) until the job completes and returns results."
+
+
@@ -0,0 +1,15 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.get-started-information-extraction.summary
+title: Summary
+metadata:
+  title: Summary
+  description: Summarize what you've learned about information extraction.
+  author: sherzyang
+  ms.author: sheryang
+  ms.date: 02/21/2026
+  ms.topic: unit
+  zone_pivot_groups: video-or-text
+durationInMinutes: 1
+content: |
+  [!include[](includes/6-summary.md)]
+
@@ -0,0 +1,29 @@
+::: zone pivot="video"
+
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=68657ae4-3276-439f-b46a-bccc06c2651c]
+
+::: zone-end
+
+::: zone pivot="text"
+
+Anyone who has manually processed invoices or forms knows how challenging it is. The great news is that we can use AI to eliminate manual effort and build the information systems of the future. AI-powered information extraction and analysis enable organizations to gain actionable insights from data that might otherwise be locked up in documents, images, audio, video, or other assets. 
+
+Examples of information extraction scenarios include:
+- **Expense claim processing**: A company needs to extract expense descriptions and amounts from scanned receipts.
+- **Customer support**: An agency needs to analyze recorded support calls to identify common problems and resolutions.
+- **Capacity planning**: A tourist organization needs to estimate visitor volumes by analyzing video footage and images. 
+
+**Microsoft Azure Content Understanding in Foundry Tools** uses AI to extract structured information from unstructured content. Azure Content Understanding helps applications understand *what* is in content by identifying entities, fields, relationships, and meaning in the content.
+
+Azure Content Understanding extracts structured data from multiple content types including: 
+
+- **Documents & images**: such as PDFs, forms, invoices, receipts, contracts
+- **Audio**: such as recordings or calls
+- **Video**: such as video of meetings or other media files 
+
+Azure Content Understanding's AI-powered information extraction automates the process of turning unstructured content into machine‑readable data that can be searched and analyzed. Next, learn how to extract structured data from unstructured documents and forms.
+
+::: zone-end
+
+> [!NOTE]
+> We recognize that different people like to learn in different ways. You can choose to complete this module in video-based format or you can read the content as text and images. The text contains greater detail than the videos, so in some cases you might want to refer to it as supplemental material to the video presentation.
@@ -0,0 +1,211 @@
+::: zone pivot="video"
+
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=b86ec81a-5db0-4f23-922c-419ccac09425]
+
+::: zone-end
+
+::: zone pivot="text"
+
+Today's business processes depend heavily on data contained in documents like forms, receipts, and invoices. Manual processing can introduce delays and errors, making data extraction automation more important than ever.
+
+## How Azure Content Understanding works
+
+Azure Content Understanding follows a model‑driven extraction workflow in which unstructured content is ingested, analyzed, and returned as structured data. 
+
+1. **Ingest content**: You submit content to Azure Content Understanding.
+
+2. **AI-powered analysis**: The service uses a combination of: Optical Character Recognition (OCR), speech recognition, natural language understanding, and multimodal AI models to analyze the content. 
+
+3. **Structured output**: The service returns structured results (for example, in JSON) that match your model—making the data easy to store, search, or integrate into downstream systems. 
+
+>[!NOTE]
+> JSON (JavaScript Object Notation) is a text‑based data format used to store and exchange structured data between systems. It's easy for humans to read and write, and easy for machines to parse and generate. 
+
+#### Understand schemas 
+
+OCR (optical character recognition) allows a computer to 'read' text from pictures, such as scanned documents, photos of receipts, or images of printed pages, and turn that text into editable and searchable digital text. Basic OCR helps recognize printed text, focuses on text extraction, and *doesn't* understand meaning, context, or relationships between words. 
+
+Azure Content Understanding's document analysis capabilities go beyond simple OCR-based text extraction to include **schema-based** extraction of fields and their values. The schema-driven approach is what differentiates Azure Content Understanding from basic OCR or transcription services.
+
+A schema describes *what information you want to extract* and *how that information should be structured*. When you define a schema, you specify fields to extract. A schema lists the specific fields or entities you care about.
+
+For example, suppose you define a schema that includes the common fields typically found in an invoice, such as:
+
+- Vendor name
+- Invoice number
+- Invoice date
+- Customer name
+- Custom address
+- Items - the items ordered, each of which includes:
+    - Item description
+    - Unit price
+    - Quantity ordered
+    - Line item total
+- Invoice subtotal
+- Tax
+- Shipping Charge
+- Invoice total
+
+Now suppose you need to extract this information from the following invoice:
+
+![Photograph of an invoice.](../media/invoice.png)
+
+Azure Content Understanding can apply the invoice schema to your invoice and identify the corresponding fields, even when they're labeled with different names (or not labeled at all). The resulting analysis produces a result like this:
+
+![Photograph of an analyzed invoice with detected fields highlighted.](../media/analyzed-invoice.png)
+
+The schema also defines the field structure. Schemas support *structured and nested fields*, not just flat text. For example: 
+
+- `Items` is a collection
+- Each item has `description`, `unit price`, `quantity`, and `line total`
+
+Identifying structured fields allows Azure Content Understanding to understand relationships between values, something OCR alone cannot do. 
+
+In the invoice example, for each detected *field*, you can extract nested values:
+
+- **Vendor name**: Adventure Works Cycles
+- **Invoice number**: 1234
+- **Invoice date**: 03/07/2025
+- **Customer name**: John Smith
+- **Custom address**: 123 River Street, Marshtown, England, GL1 234
+- **Items**:
+    - Item 1:
+        - **Item description**: 38" Racing Bike (Red)
+        - **Unit price**: 1299.00
+        - **Quantity ordered**: 1
+        - **Line item total**: 1299.00
+     - Item 2:
+        - **Item description**: Cycling helmet (Black)
+        - **Unit price**: 25.99
+        - **Quantity ordered**: 1
+        - **Line item total**: 25.99
+     - Item 3:
+        - **Item description**: Cycling shirt (L)
+        - **Unit price**: 42.50
+        - **Quantity ordered**: 2
+        - **Line item total**: 85.00
+- **Invoice subtotal**: 1409.99
+- **Tax**: 140.99
+- **Shipping Charge**: 35.00
+- **Invoice total**: 1585.98
+
+Azure Content Understanding extracts expected meaning, not just labels. Schemas are applied *semantically*, meaning:
+- Fields can be extracted even if labels differ
+- Fields can be extracted even if labels are missing
+
+For example, *Invoice No.*, *Invoice #*, or an unlabeled number can all map to `InvoiceNumber` if the analyzer determines they represent the same concept. 
+
+#### Understand analyzers 
+
+An **analyzer** is a unit in Azure Content Understanding that takes input, applies AI analysis, and produces structured results. Analyzers consistently apply the same extraction logic to all incoming content. Once it's configured, an analyzer ensures a schema is reused consistently for every analysis request. Analyzers also produce predictable JSON results. The structured results make downstream processing (storage, search, automation) easier.
+
+Azure Content Understanding offers prebuilt analyzers for common scenarios and supports custom analyzers tailored to your needs. At a high level:
+
+1. You choose or create an analyzer.
+2. The analyzer includes a schema defining fields and structure.
+3. You submit content for analysis
+4. The service applies the schema
+5. You receive structured JSON results matching the schema
+
+## Using Azure Content Understanding in the Foundry portal 
+
+> [!NOTE]
+> Foundry portal has a *classic* user interface (UI) and a *new* user interface.
+
+After you create a *Microsoft Foundry resource*, you can use the ***classic* Foundry portal interface** to test out 
+Azure Content Understanding. The Foundry portal provides content examples and allows you to upload your own material for analysis. 
+
+You can use the visual interface to select a source document and extract default fields of information. For example, when you try out Azure Content Understanding on an image of a document, the service returns the document text and text layout information. 
+
+:::image type="content" source="../media/document-analysis-playground.png" alt-text="Screenshot of the classic Foundry portal with a document analyzed with Azure Content Understanding." lightbox="../media/document-analysis-playground.png":::
+
+Azure Content Understanding's analyzers identify text values in documents and map them to specific fields. For example, given an invoice, the service returns the fields (such as Vendor address) and the data in the fields (such as 123 456th Street). 
+
+:::image type="content" source="../media/invoice-playground.png" alt-text="Screenshot of the classic Foundry portal with an invoice analyzed with Azure Content Understanding." lightbox="../media/invoice-playground.png":::
+
+In Foundry portal, you can also view the JSON results of the processing. 
+
+:::image type="content" source="../media/invoice-json-result-playground.png" alt-text="Screenshot of the classic Foundry portal with the JSON result of an invoice analyzed with Azure Content Understanding." lightbox="../media/invoice-json-result-playground.png":::
+
+## Building a client application with Azure Content Understanding 
+
+You can use the **Content Understanding API** to build a lightweight client application that extracts data programmatically. 
+
+>[!NOTE]
+> A client application is a software program that runs on a user's device and requests services or data from another system, typically a server, over a network. The *client* is the part of an application that users interact with, while the *server* does the heavy work behind the scenes. Applications can request data or actions from a service and receive a structured response using an API.
+
+When you use the Content Understanding API, you can choose a prebuilt analyzer or create a custom analyzer. Prebuilt analyzers include: `prebuilt-invoice`, `prebuilt-imageSearch`, `prebuilt-audioSearch`, and `prebuilt-videoSearch`. When you submit content for analysis to the analyzer, the analysis is **asynchronous**, which means you get the result later when it's ready. Because the analysis is asynchronous, you need to *poll* the Operation-Location URL (or `analyzerResults`) until the job succeeds. 
+
+#### Using the Azure Content Understanding Python SDK
+
+Let's take a look at the process of using the Python SDK to analyze an invoice from a URL.  
+
+1. Install the Azure Content Understanding Python SDK. 
+
+```bash
+python -m pip install azure-ai-contentunderstanding
+```
+
+2. Identify your Foundry resource endpoint and API key or Microsoft Entra ID. Your endpoint typically looks like: `https://<your-resource-name>.services.ai.azure.com/`
+
+3. Create and run the client application code. The `analzyer_id` is the ID of the prebuilt analyzer. You can find a list of prebuilt analyzer ID values [here](/azure/ai-services/content-understanding/concepts/prebuilt-analyzers). 
+
+```python
+import os
+from azure.ai.contentunderstanding import ContentUnderstandingClient
+from azure.core.credentials import AzureKeyCredential
+
+endpoint = os.environ["FOUNDRY_ENDPOINT"]
+key = os.environ["FOUNDRY_KEY"]
+
+client = ContentUnderstandingClient(endpoint=endpoint, credential=AzureKeyCredential(key))
+
+# 1) start analysis with analyzer id + inputs
+analyzer_id = "prebuilt-invoice"
+inputs = [
+    {"url": "https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/invoice.pdf"}
+]
+
+# 2) wait for the Long Running Operation (LRO) to complete
+poller = client.begin_analyze(analyzer_id=analyzer_id, inputs=inputs)  # starts LRO
+result = poller.result()  # waits for completion (polling handled by SDK)
+
+# 3) read structured fields + markdown
+# The result typically includes extracted "fields" and "markdown" per input content item.
+for content in result.contents:
+    print(content.markdown)
+    print(content.fields)
+```
+
+The resulting output is JSON that shows the extracted markdown, fields, data in the fields, and confidence score. For example: 
+
+```json
+{
+	"status": "Succeeded",
+	"result": {
+		"analyzerId": "prebuilt-invoice",
+		"apiVersion": "2025-05-01-preview",
+		"contents": [
+			{
+				"markdown": "# INVOICE\n\nCONTOSO LTD.\n\nContoso Headquarters\n123 456th St\nNew York, NY, 10001\n\nINVOICE: INV-100\n\nINVOICE DATE: 11/15/2019\n\nDUE DATE: 12/15/2019\n\nCUSTOMER NAME: MICROSOFT CORPORATION\n",
+				"fields": {
+					"CustomerName": {
+						"type": "string",
+						"valueString": "MICROSOFT CORPORATION",
+						"confidence": 0.95,
+					},
+					"InvoiceDate": {
+						"type": "date",
+						"valueDate": "2019-11-15",
+						"confidence": 0.994,
+					}
+                }
+            }
+        ]
+    }
+}
+```
+
+Next, learn how to use Azure Content Understanding analyzers to extract structured data from audio and video. 
+ 
+::: zone-end