Skip to content

Commit f16e555

Browse files
authored
Merge pull request #54406 from weslbo/freshness-update
Freshness update: databricks modules
2 parents 51d03c5 + a1c46c3 commit f16e555

57 files changed

Lines changed: 129 additions & 81 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

learn-pr/wwl-data-ai/build-data-pipeline-with-delta-live-tables/1-introduction.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Introduction
44
metadata:
55
title: Introduction
66
description: "Introduction"
7-
ms.date: 09/12/2025
7+
ms.date: 04/27/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit

learn-pr/wwl-data-ai/build-data-pipeline-with-delta-live-tables/2-explore-delta-live-tables.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Explore Lakeflow Declarative Pipelines
44
metadata:
55
title: Explore Lakeflow Declarative Pipelines
66
description: "Explore Lakeflow Declarative Pipelines"
7-
ms.date: 09/12/2025
7+
ms.date: 04/27/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit

learn-pr/wwl-data-ai/build-data-pipeline-with-delta-live-tables/3-data-ingestion-and-integration.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Data ingestion and integration
44
metadata:
55
title: Data Ingestion and Integration
66
description: "Data ingestion and integration"
7-
ms.date: 09/12/2025
7+
ms.date: 04/27/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit

learn-pr/wwl-data-ai/build-data-pipeline-with-delta-live-tables/4-real-time-processing.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Real-time processing
44
metadata:
55
title: Real-time Processing
66
description: "Real-time processing"
7-
ms.date: 09/12/2025
7+
ms.date: 04/27/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit

learn-pr/wwl-data-ai/build-data-pipeline-with-delta-live-tables/5-exercise-data-pipeline.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Exercise - Create a Lakeflow Declarative Pipeline
44
metadata:
55
title: Exercise - Create a Lakeflow Declarative Pipeline
66
description: "Exercise - Create a Lakeflow Declarative Pipeline"
7-
ms.date: 09/12/2025
7+
ms.date: 04/27/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit

learn-pr/wwl-data-ai/build-data-pipeline-with-delta-live-tables/6-knowledge-check.yml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Module assessment
44
metadata:
55
title: Module Assessment
66
description: "Knowledge check"
7-
ms.date: 09/12/2025
7+
ms.date: 04/27/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit
@@ -18,24 +18,24 @@ quiz:
1818
choices:
1919
- content: "Reducing the cost of storage"
2020
isCorrect: false
21-
explanation: "Incorrect. Reducing the cost of storage isn't the primary benefit of using DLT in Azure Databricks for real-time processing."
21+
explanation: "Incorrect. Reducing the cost of storage isn't the primary benefit of using Lakeflow Declarative Pipelines in Azure Databricks for real-time processing."
2222
- content: "Automating data pipeline management"
2323
isCorrect: true
2424
explanation: "Correct. Lakeflow Declarative Pipelines is a framework that simplifies the management of data pipelines by automating complex tasks such as error handling, monitoring, and data pipeline lineage. This automation is valuable in real-time data processing, where managing data flows efficiently and reliably is critical."
2525
- content: "Increasing the latency of data processing."
2626
isCorrect: false
27-
explanation: "Incorrect. Increasing the latency of data processing isn't the primary benefit of using DLT in Azure Databricks for real-time processing."
27+
explanation: "Incorrect. Increasing the latency of data processing isn't the primary benefit of using Lakeflow Declarative Pipelines in Azure Databricks for real-time processing."
2828
- content: "Which feature of Lakeflow Declarative Pipelines ensures data reliability and quality in real-time processing environments?"
2929
choices:
30-
- content: "Live data"
30+
- content: "Manual schema validation"
3131
isCorrect: false
32-
explanation: "Incorrect. You can receive live data with real-time data processing with Lakeflow Declarative Pipelines but this feature doesn't ensure data reliability and quality."
33-
- content: "ACID Transactions"
32+
explanation: "Incorrect. Manual schema validation isn't a built-in feature of Lakeflow Declarative Pipelines and doesn't automatically ensure data reliability at runtime."
33+
- content: "Data quality expectations"
3434
isCorrect: true
35-
explanation: "Correct. Lakeflow Declarative Pipelines support ACID transactions, which are crucial for ensuring data integrity by making all operations atomic, consistent, isolated, and durable. This is important in real-time processing environments where concurrent data modifications can lead to inconsistencies without proper transaction controls."
35+
explanation: "Correct. Lakeflow Declarative Pipelines provide built-in data quality expectations that validate records as data flows through the pipeline. You can configure expectations to log violations, drop invalid records, or fail the pipeline update, ensuring that only data meeting your quality rules reaches downstream tables."
3636
- content: "Data Lake"
3737
isCorrect: false
38-
explanation: "Incorrect. A Data Lake is a centralized repository that houses vast volumes of structured and unstructured data from various sources but it doesn't ensure data reliability and quality."
38+
explanation: "Incorrect. A Data Lake is a centralized storage repository and isn't a Lakeflow Declarative Pipelines feature that enforces data quality constraints."
3939
- content: "Which component of Azure Databricks enhances performance and scalability of data operations on Delta Lake?"
4040
choices:
4141
- content: "Azure Blob Storage"
@@ -44,6 +44,6 @@ quiz:
4444
- content: "Azure Synapse Analytics"
4545
isCorrect: false
4646
explanation: "Incorrect. Azure Synapse Analytics is an enterprise analytics service that accelerates time to insight across data warehouses and big data systems. It isn't used to enhance performance and scalability of data operations on Delta Lake."
47-
- content: "Delta Engine"
47+
- content: "Photon"
4848
isCorrect: true
49-
explanation: "Correct. Delta Engine significantly enhances the performance and scalability of operations on Delta Lake in Azure Databricks. It optimizes the execution of queries by utilizing a high-performance, in-memory execution environment, which is crucial for processing large datasets efficiently."
49+
explanation: "Correct. Photon is the vectorized query engine built into Databricks Runtime that significantly enhances the performance and scalability of operations on Delta Lake in Azure Databricks. It accelerates data processing by executing queries using a native, vectorized execution model, which is crucial for processing large datasets efficiently."

learn-pr/wwl-data-ai/build-data-pipeline-with-delta-live-tables/7-summary.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Summary
44
metadata:
55
title: Summary
66
description: "Summary"
7-
ms.date: 09/12/2025
7+
ms.date: 04/27/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit

learn-pr/wwl-data-ai/build-data-pipeline-with-delta-live-tables/includes/3-data-ingestion-and-integration.md

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,9 @@ SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE');
9090

9191
Optionally, you can use expectations to apply quality constraints that validate data as it flows through ETL pipelines. Expectations provide greater insight into data quality metrics and allow you to fail updates or drop records when detecting invalid records.
9292

93+
> [!IMPORTANT]
94+
> The **Advanced** product edition of Lakeflow Declarative Pipelines is required to use expectations. If your pipeline includes expectations with the Core or Pro editions, you receive an error.
95+
9396
![Diagram showing Lakeflow Declarative Pipelines expectations.](../media/expectations.png)
9497

9598
Here's an example of a materialized view that defines a constraint clause. In this case, the constraint contains the actual logic for what is being validated: the Country_Region shouldn't be empty. When a record fails this condition, the expectation is triggered.
@@ -106,7 +109,7 @@ SELECT
106109
Confirmed,
107110
Deaths,
108111
Recovered
109-
FROM live.raw_covid_data;
112+
FROM raw_covid_data;
110113
```
111114

112115
Examples of constraints:
@@ -160,7 +163,7 @@ SELECT
160163
sum(Confirmed) as Total_Confirmed,
161164
sum(Deaths) as Total_Deaths,
162165
sum(Recovered) as Total_Recovered
163-
FROM live.processed_covid_data
166+
FROM processed_covid_data
164167
GROUP BY Report_Date;
165168
```
166169

@@ -177,4 +180,15 @@ Lakeflow Declarative Pipelines support tasks such as:
177180
- Observing the progress and status of pipeline updates.
178181
- Alerting on pipeline events such as the success or failure of pipeline updates.
179182
- Viewing metrics for streaming sources like Apache Kafka and Auto Loader.
180-
- Receiving email notifications when a pipeline update fails or completes successfully.
183+
- Receiving email notifications when a pipeline update fails or completes successfully.
184+
185+
## Develop with the Lakeflow Pipelines Editor
186+
187+
The **Lakeflow Pipelines Editor** is the integrated development environment for creating and iterating on pipeline source code. When you create a new pipeline, the editor provides a default folder structure with a `transformations/` directory for source code and an `explorations/` directory for ad hoc analysis notebooks. Store your pipeline source code in a Git folder to enable version control.
188+
189+
The editor supports iterative development through several features:
190+
191+
- **Dry run**: Validates your pipeline code without processing data, allowing you to catch syntax errors and missing dependencies before execution.
192+
- **Selective execution**: Run individual files or single table definitions rather than the entire pipeline, enabling faster iteration during development.
193+
- **Interactive DAG**: Visualize the dependency graph between your tables, select specific tables for targeted refreshes, and inspect execution metrics.
194+
- **Data preview**: Sample data from streaming tables and materialized views directly in the editor to verify transformation logic.

learn-pr/wwl-data-ai/build-data-pipeline-with-delta-live-tables/includes/7-summary.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,4 @@ In this module, you learned how to:
99
## Learn more
1010

1111
- [Lakeflow Declarative Pipelines](/azure/databricks/ldp/concepts)
12-
- [Load data with Lakeflow Declarative Pipelines](https://docs.databricks.com/aws/en/dlt/load)
12+
- [Load data with Lakeflow Declarative Pipelines](/azure/databricks/dlt/load)

learn-pr/wwl-data-ai/build-data-pipeline-with-delta-live-tables/index.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ uid: learn.wwl.build-data-pipeline-with-delta-live-tables
33
metadata:
44
title: Build Lakeflow Declarative Pipelines
55
description: "Learn how to build Lakeflow Declarative Pipelines in Azure Databricks"
6-
ms.date: 09/12/2025
6+
ms.date: 04/27/2026
77
author: weslbo
88
ms.author: wedebols
99
ms.topic: module-standard-task-based

0 commit comments

Comments
 (0)