Skip to content

Commit 6f0fb55

Browse files
committed
refreshed module
1 parent 7d0859f commit 6f0fb55

12 files changed

Lines changed: 27 additions & 27 deletions

learn-pr/wwl-data-ai/introduction-to-data-engineering-azure/1-introduction.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Introduction
44
metadata:
55
title: Introduction
66
description: "Introduction"
7-
ms.date: 08/21/2025
7+
ms.date: 04/03/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit

learn-pr/wwl-data-ai/introduction-to-data-engineering-azure/2-what-data-engineering.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: What is data engineering
44
metadata:
55
title: What is Data Engineering
66
description: "What is data engineering"
7-
ms.date: 08/21/2025
7+
ms.date: 04/03/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit

learn-pr/wwl-data-ai/introduction-to-data-engineering-azure/4-common-patterns-azure-data-engineering.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Important data engineering concepts
44
metadata:
55
title: Important Data Engineering Concepts
66
description: "Important data engineering concepts"
7-
ms.date: 08/21/2025
7+
ms.date: 04/03/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit

learn-pr/wwl-data-ai/introduction-to-data-engineering-azure/5-common-tooling-azure-data-engineering.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Data engineering in Microsoft Azure
44
metadata:
55
title: Data Engineering in Microsoft Azure
66
description: "Data engineering in Microsoft Azure"
7-
ms.date: 08/21/2025
7+
ms.date: 04/03/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit

learn-pr/wwl-data-ai/introduction-to-data-engineering-azure/6-knowledge-check.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Module assessment
44
metadata:
55
title: Module Assessment
66
description: "Knowledge check"
7-
ms.date: 08/21/2025
7+
ms.date: 04/03/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit

learn-pr/wwl-data-ai/introduction-to-data-engineering-azure/7-summary.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Summary
44
metadata:
55
title: Summary
66
description: "Summary"
7-
ms.date: 08/21/2025
7+
ms.date: 04/03/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit
Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1 @@
1-
2-
3-
In most organizations, a data engineer is the primary role responsible for integrating, transforming, and consolidating data from various structured and unstructured data systems into structures that are suitable for building analytics solutions. An Azure data engineer also helps ensure that data pipelines and data stores are high-performing, efficient, organized, and reliable, given a specific set of business requirements and constraints.
1+
In most organizations, a data engineer is the primary role responsible for integrating, transforming, and consolidating data from various structured and unstructured data systems into structures that are suitable for building analytics solutions. An Azure data engineer also helps ensure that data pipelines and data stores are high-performing, efficient, organized, and reliable, given a specific set of business requirements and constraints.

learn-pr/wwl-data-ai/introduction-to-data-engineering-azure/includes/2-what-data-engineering.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,17 @@
1-
2-
31
The data engineer will often work with multiple types of data to perform many operations using many scripting or coding languages that are appropriate to their individual organization.
42

53
## Types of data
64

7-
There are three primary types of data that a data engineer will work with.
5+
There are three primary types of data that a data engineer works with.
86

97
| Structured | Semi-structured | Unstructured |
10-
| -- | -- | -- |
8+
|--|--|--|
119
| ![Diagram of Structured data type.](../media/2-structured-data.png) | ![Diagram of Semi-structured data type.](../media/2-semi-structured-data.png) | ![Diagram of Unstructured data type.](../media/2-unstructured-data.png) |
1210
| Structured data primarily comes from table-based source systems such as a relational database or from a flat file such as a comma separated (CSV) file. The primary element of a structured file is that the rows and columns are aligned consistently throughout the file. | Semi-structured data is data such as JavaScript object notation (JSON) files, which may require flattening prior to loading into your source system. When flattened, this data doesn't have to fit neatly into a table structure. | Unstructured data includes data stored as key-value pairs that don't adhere to standard relational models and Other types of unstructured data that are commonly used include portable data format (PDF), word processor documents, and images. |
1311

1412
## Data operations
1513

16-
As a data engineer some of the main tasks that you'll perform in Azure include *data integration*, *data transformation*, and *data consolidation*.
14+
As a data engineer some of the main tasks that you perform in Azure include *data integration*, *data transformation*, and *data consolidation*.
1715

1816
### Data integration
1917

@@ -39,6 +37,8 @@ Data Engineers must be proficient with a range of tools and scripting languages
3937

4038
- **SQL** - One of the most common languages data engineers use is SQL, or Structured Query Language, which is a relatively easy language to learn. SQL uses queries that include SELECT, INSERT, UPDATE, and DELETE statements to directly work with the data stored in tables.
4139

42-
- **Python** - Python is one of the most popular and fastest growing programming languages in the world. It's used for all sorts of tasks, including web programming and data analysis. It has emerged as the language to learn for machine learning, and is increasing in popularity in data engineering with the use of notebooks.
40+
- **Python** - Python is one of the most popular and fastest growing programming languages in the world. It's used for all sorts of tasks, including web programming and data analysis. It has emerged as the language to learn for machine learning, and is increasing in popularity in data engineering with the use of notebooks. In large-scale data engineering workloads, data engineers typically use PySpark—the Python API for Apache Spark—to write transformation logic that runs across distributed Spark clusters.
41+
42+
- **KQL** - Kusto Query Language (KQL) is a query language for analyzing streaming and log data in real-time analytics scenarios. Data engineers use KQL in Microsoft Fabric's Real-Time Intelligence workload and Azure Data Explorer to query high-velocity data streams.
4343

4444
- **Others** - Depending upon the needs of the organization and your individual skill set, you may also use other popular languages within or outside of notebooks including R, Java, Scala, C#, and more. The use of notebooks is growing in popularity, and allows collaboration using different languages within the same notebook.

learn-pr/wwl-data-ai/introduction-to-data-engineering-azure/includes/4-common-patterns-azure-data-engineering.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
1-
2-
31
There are some core concepts with which data engineers should be familiar. These concepts underpin many of the workloads that data engineers must implement and support.
42

53
## Operational and analytical data
64

75
![Diagram representing operational and analytical data.](../media/4-operational-analytical-data.png)
86

9-
*Operational* data is usually transactional data that is generated and stored by applications, often in a relational or non-relational database. *Analytical* data is data that has been optimized for analysis and reporting, often in a data warehouse.
7+
*Operational* data is usually transactional data that is generated and stored by applications, often in a relational, or non-relational database. *Analytical* data is data that has been optimized for analysis and reporting, often in a data warehouse.
108

119
One of the core responsibilities of a data engineer is to design, implement, and manage solutions that integrate operational and analytical data sources or extract operational data from multiple systems, transform it into appropriate structures for analytics, and load it into an analytical data store (usually referred to as ETL solutions).
1210

@@ -40,6 +38,12 @@ A data warehouse is a centralized repository of integrated data from one or more
4038

4139
Data engineers are responsible for designing and implementing relational data warehouses, and managing regular data loads into tables.
4240

41+
## Lakehouses
42+
43+
A lakehouse combines the scalability of a data lake with the querying capabilities of a data warehouse. Rather than maintaining two separate systems—one for raw file storage and one for structured analytics—a lakehouse stores all data in a single location using the Delta Lake format, which provides ACID transactions, schema enforcement, and support for both structured and unstructured data.
44+
45+
This means data engineers can use Apache Spark and notebooks to ingest and transform raw data, while analysts can query the same data using familiar SQL tools—without moving data between systems. The lakehouse architecture has become central to modern data engineering on Microsoft Azure, and is the primary data store pattern used in Microsoft Fabric.
46+
4347
## Apache Spark
4448

4549
![Diagram representing an Apache Spark cluster.](../media/4-apache-spark.png)
Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
1-
2-
31
![Diagram of the flow of a typical enterprise data analytics solution.](../media/3-data-engineering-azure.png)
42

53
Microsoft Azure includes many services that can be used to implement and manage data engineering workloads.
64

75
The diagram displays the flow from left to right of a typical enterprise data analytics solution, including some of the key Azure services that may be used. Operational data is generated by applications and devices and stored in Azure data storage services such as Azure SQL Database, Azure Cosmos DB, and Microsoft Dataverse. Streaming data is captured in event broker services such as Azure Event Hubs.
86

9-
This operational data must be captured, ingested, and consolidated into analytical stores; from where it can be modeled and visualized in reports and dashboards. These tasks represent the core area of responsibility for the data engineer. The core Azure technologies used to implement data engineering workloads include:
7+
This operational data must be captured, ingested, and consolidated into analytical stores; from where it can be modeled and visualized in reports and dashboards. These tasks represent the core area of responsibility for the data engineer. The core Microsoft technologies used to implement data engineering workloads include:
108

11-
- Azure Synapse Analytics
9+
- Microsoft Fabric
1210
- Azure Data Lake Storage Gen2
1311
- Azure Stream Analytics
1412
- Azure Data Factory
1513
- Azure Databricks
1614

15+
Microsoft Fabric is a unified, end-to-end SaaS analytics platform built on OneLake that brings together data engineering, data factory, data science, data warehousing, real-time intelligence, databases, business intelligence, and IQ in a single integrated environment. IQ is a workload for unifying business semantics across data, models, and systems to power intelligent agents and decisions grounded in a live, holistic view of the business.
16+
1717
The analytical data stores that are populated with data produced by data engineering workloads support data modeling and visualization for reporting and analysis, often using sophisticated visualization tools such as Microsoft Power BI.

0 commit comments

Comments
 (0)