You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/create-and-organize-objects-in-unity-catalog/includes/2-apply-naming-conventions.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -90,7 +90,7 @@ Name clusters according to their purpose and environment to make resource alloca
90
90
91
91
Structure job names using the pattern `job_{layer}_{purpose}` to align with your data transformation pipeline. Examples include `job_bronze_orders_ingestion`, `job_silver_orders_transformation`, and `job_gold_sales_aggregation`. This naming pattern makes dependencies between jobs immediately visible and helps you trace data lineage across the medallion architecture.
92
92
93
-
For Lakeflow Declarative Pipelines pipelines, use the prefix `pipe_` followed by the data domain or purpose: `pipe_orders_processing`, `pipe_customer_data_cleaning`.
93
+
For Lakeflow Spark Declarative Pipelines pipelines, use the prefix `pipe_` followed by the data domain or purpose: `pipe_orders_processing`, `pipe_customer_data_cleaning`.
94
94
95
95
Name streaming pipelines to include both source and target, following patterns like `stream_{source}_to_{target}`. Examples such as `stream_kafka_to_bronze` and `stream_iot_sensor_data` make data flow explicit without requiring pipeline documentation. This convention is especially valuable when managing multiple concurrent streaming workloads.
description: Learn how to choose between notebooks and Lakeflow Declarative Pipelines for building data pipelines in Azure Databricks, comparing flexibility, maintainability, and use cases.
6
+
description: Learn how to choose between notebooks and Lakeflow Spark Declarative Pipelines for building data pipelines in Azure Databricks, comparing flexibility, maintainability, and use cases.
title: Create pipeline with Lakeflow Declarative Pipelines
3
+
title: Create pipeline with Lakeflow Spark Declarative Pipelines
4
4
metadata:
5
-
title: Create Pipeline with Lakeflow Declarative Pipelines
6
-
description: Learn how to create data pipelines using Lakeflow Declarative Pipelines in Azure Databricks, including streaming tables, materialized views, and data quality expectations.
5
+
title: Create Pipeline with Lakeflow Spark Declarative Pipelines
6
+
description: Learn how to create data pipelines using Lakeflow Spark Declarative Pipelines in Azure Databricks, including streaming tables, materialized views, and data quality expectations.
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/design-implement-data-pipelines/8-knowledge-check.yml
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -27,14 +27,14 @@ quiz:
27
27
- content: "Bronze layer"
28
28
isCorrect: true
29
29
explanation: "Correct. The bronze layer stores ingested data with minimal transformation, preserving the raw state for auditing and potential reprocessing."
30
-
- content: "What is the primary advantage of using Lakeflow Declarative Pipelines over notebooks for production data pipelines?"
30
+
- content: "What is the primary advantage of using Lakeflow Spark Declarative Pipelines over notebooks for production data pipelines?"
31
31
choices:
32
32
- content: "Declarative pipelines allow rapid prototyping and cell-by-cell inspection"
33
33
isCorrect: false
34
34
explanation: "Incorrect. Notebooks are better suited for rapid prototyping and interactive exploration."
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/design-implement-data-pipelines/includes/1-introduction.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
Building reliable data pipelines requires more than connecting data sources to destinations. You need to design workflows that handle failures gracefully, scale with growing data volumes, and remain maintainable as business requirements evolve. Azure Databricks provides multiple approaches for creating data pipelines—from flexible **notebooks** with procedural code to **Lakeflow Declarative Pipelines** that automate orchestration and data quality enforcement.
1
+
Building reliable data pipelines requires more than connecting data sources to destinations. You need to design workflows that handle failures gracefully, scale with growing data volumes, and remain maintainable as business requirements evolve. Azure Databricks provides multiple approaches for creating data pipelines—from flexible **notebooks** with procedural code to **Lakeflow Spark Declarative Pipelines** that automate orchestration and data quality enforcement.
2
2
3
3
When you design data pipelines, you make decisions that affect every downstream consumer of your data. The order of operations determines whether transformations build on validated, well-structured data. Your choice between notebooks and declarative pipelines influences how much orchestration code you write versus how much the platform manages for you. **Task dependencies** in **Lakeflow Jobs** control execution flow and enable parallel processing that reduces pipeline runtime.
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/design-implement-data-pipelines/includes/3-choose-notebook-lakeflow-pipelines.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,14 @@
1
-
When you build data pipelines in Azure Databricks, you have two primary approaches: **notebooks** with procedural code and **Lakeflow Declarative Pipelines**. Each approach serves different needs, and understanding when to use each helps you deliver maintainable, efficient data solutions.
1
+
When you build data pipelines in Azure Databricks, you have two primary approaches: **notebooks** with procedural code and **Lakeflow Spark Declarative Pipelines**. Each approach serves different needs, and understanding when to use each helps you deliver maintainable, efficient data solutions.
2
2
3
3
## Understand the two approaches
4
4
5
5
Notebooks execute code **step by step**. You control every aspect of data processing—from reading sources to writing outputs. This **procedural approach** gives you full control over execution flow, error handling, and optimization decisions.
6
6
7
-
Lakeflow Declarative Pipelines work differently. Instead of specifying **how** to process data, you define **what** you want as the end result. You declare your **streaming tables** and **materialized views**, and the pipeline engine handles **orchestration**, **parallelization**, and **error recovery** automatically.
7
+
Lakeflow Spark Declarative Pipelines work differently. Instead of specifying **how** to process data, you define **what** you want as the end result. You declare your **streaming tables** and **materialized views**, and the pipeline engine handles **orchestration**, **parallelization**, and **error recovery** automatically.
8
8
9
-
:::image type="content" source="../media/3-understand-notebook-pipeline-approach.png" alt-text="Diagram explaining the two approaches when it comes to choosing notebooks or Lakeflow Declarative Pipelines." border="false" lightbox="../media/3-understand-notebook-pipeline-approach.png":::
9
+
:::image type="content" source="../media/3-understand-notebook-pipeline-approach.png" alt-text="Diagram explaining the two approaches when it comes to choosing notebooks or Lakeflow Spark Declarative Pipelines." border="false" lightbox="../media/3-understand-notebook-pipeline-approach.png":::
10
10
11
-
Consider a scenario where you need to ingest sales data, join it with product information, and calculate regional aggregates. With a notebook, you write explicit read, join, and aggregation commands in sequence. With Lakeflow Declarative Pipelines, you define the final tables and their relationships—the system determines the most efficient execution plan.
11
+
Consider a scenario where you need to ingest sales data, join it with product information, and calculate regional aggregates. With a notebook, you write explicit read, join, and aggregation commands in sequence. With Lakeflow Spark Declarative Pipelines, you define the final tables and their relationships—the system determines the most efficient execution plan.
12
12
13
13
## When notebooks fit best
14
14
@@ -22,17 +22,17 @@ Notebooks excel in scenarios requiring **flexibility** and **detailed control**.
22
22
23
23
**Fine-grained performance tuning**. When you need to manually control **partitioning**, **caching strategies**, or specific **Spark configurations**, notebooks give you direct access to these optimizations.
24
24
25
-
## When Lakeflow Declarative Pipelines fit best
25
+
## When Lakeflow Spark Declarative Pipelines fit best
26
26
27
-
Lakeflow Declarative Pipelines simplify **production data pipelines** by handling operational complexity automatically. Choose this approach when your pipeline needs:
27
+
Lakeflow Spark Declarative Pipelines simplify **production data pipelines** by handling operational complexity automatically. Choose this approach when your pipeline needs:
28
28
29
29
**Standardized ETL patterns**. For common ingestion and transformation workflows—reading from cloud storage, applying **schema evolution**, maintaining **slowly changing dimensions**—the declarative approach reduces thousands of lines of code to a few statements.
30
30
31
31
**Built-in data quality enforcement**. Declarative pipelines include **expectations** that validate data as it flows through. You define **quality rules** directly in your pipeline definition, and the system tracks violations and can halt processing when data quality degrades.
32
32
33
33
**Automatic dependency management**. The pipeline engine analyzes relationships between your tables and determines the correct **execution order**. When source data updates, the engine refreshes only the **affected downstream tables**.
34
34
35
-
**Operational visibility**. Lakeflow Declarative Pipelines provide **lineage tracking**, **execution graphs**, and **monitoring dashboards** without additional configuration. Operations teams can trace data from source to target and troubleshoot issues faster.
35
+
**Operational visibility**. Lakeflow Spark Declarative Pipelines provide **lineage tracking**, **execution graphs**, and **monitoring dashboards** without additional configuration. Operations teams can trace data from source to target and troubleshoot issues faster.
36
36
37
37
## Compare the approaches
38
38
@@ -57,8 +57,8 @@ Start by evaluating your specific requirements. Ask these questions:
57
57
- What level of **operational monitoring** does your team need?
58
58
- Who maintains this pipeline—**seasoned developers** or a broader team with varied skills?
59
59
60
-
For production pipelines with standard ingestion and transformation patterns, Lakeflow Declarative Pipelines **reduce operational burden** and **improve maintainability**. You spend less time writing orchestration code and more time defining business logic.
60
+
For production pipelines with standard ingestion and transformation patterns, Lakeflow Spark Declarative Pipelines **reduce operational burden** and **improve maintainability**. You spend less time writing orchestration code and more time defining business logic.
61
61
62
62
For exploratory work, complex integrations, or pipelines requiring extensive customization, notebooks provide the **flexibility** you need. You can always refactor successful notebook prototypes into declarative pipelines once the logic stabilizes.
63
63
64
-
Many teams use **both approaches together**. Notebooks handle custom preprocessing or machine learning model training, while Lakeflow Declarative Pipelines manage the core ETL workflow. This **hybrid approach** lets you use each tool where it performs best.
64
+
Many teams use **both approaches together**. Notebooks handle custom preprocessing or machine learning model training, while Lakeflow Spark Declarative Pipelines manage the core ETL workflow. This **hybrid approach** lets you use each tool where it performs best.
Tasks within the same job can use different compute resources. A common pattern assigns SQL tasks to a SQL warehouse while notebook-based transformations run on jobs compute.
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/design-implement-data-pipelines/includes/5-design-error-handling-pipelines.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ Each scenario requires a different response. Some errors warrant immediate pipel
19
19
20
20
## Define data quality expectations in declarative pipelines
21
21
22
-
Lakeflow Declarative Pipelines provides built-in data quality constraints called **expectations**. These constraints validate records as data flows through your pipeline, giving you control over how to handle invalid data.
22
+
Lakeflow Spark Declarative Pipelines provides built-in data quality constraints called **expectations**. These constraints validate records as data flows through your pipeline, giving you control over how to handle invalid data.
23
23
24
24
:::image type="content" source="../media/5-define-data-quality-expectations.png" alt-text="Screenshot of the declarative pipeline editor, highlighting expectations." border="false" lightbox="../media/5-define-data-quality-expectations.png":::
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/design-implement-data-pipelines/includes/7-create-pipeline-lakeflow-declarative.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
-
Production data pipelines require reliability, maintainability, and clear data quality enforcement. As a data engineer, you likely spend significant time writing code to handle incremental processing, orchestrate dependencies, and validate data quality. **Lakeflow Declarative Pipelines** in Azure Databricks addresses these challenges by letting you define *what* your data should look like rather than *how* to process it step by step.
1
+
Production data pipelines require reliability, maintainability, and clear data quality enforcement. As a data engineer, you likely spend significant time writing code to handle incremental processing, orchestrate dependencies, and validate data quality. **Lakeflow Spark Declarative Pipelines** in Azure Databricks addresses these challenges by letting you define *what* your data should look like rather than *how* to process it step by step.
2
2
3
3
In this unit, you learn how to create data pipelines using the declarative approach, define streaming tables and materialized views, and apply data quality expectations to enforce constraints on your data.
4
4
5
5
## Understand the declarative approach
6
6
7
-
Traditional data pipelines require you to write imperative code that specifies every processing step. You handle incremental processing logic, manage checkpoint recovery, and orchestrate dependencies between tables. With Lakeflow Declarative Pipelines, you instead declare the **desired end state**, and the framework handles the execution details.
7
+
Traditional data pipelines require you to write imperative code that specifies every processing step. You handle incremental processing logic, manage checkpoint recovery, and orchestrate dependencies between tables. With Lakeflow Spark Declarative Pipelines, you instead declare the **desired end state**, and the framework handles the execution details.
8
8
9
9
The declarative approach provides three key benefits for production pipelines:
0 commit comments