Skip to content

Commit c33d2ca

Browse files
authored
Merge pull request #52970 from weslbo/final-edits
final edits
2 parents 8b62dd4 + 5891058 commit c33d2ca

19 files changed

Lines changed: 15 additions & 1430 deletions

File tree

learn-pr/wwl-databricks/design-implement-data-modeling-unity-catalog/index.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
### YamlMime:Module
22
uid: learn.wwl.design-implement-data-modeling-unity-catalog
33
metadata:
4-
title: Design and Implement Data Modeling
4+
title: Design and Implement Data Modeling with Azure Databricks
55
description: Learn how to design and implement data modeling strategies in Azure Databricks with Unity Catalog, including ingestion patterns, table formats, partitioning, slowly changing dimensions, and clustering strategies.
66
ms.date: 12/07/2025
77
author: weslbo
88
ms.author: wedebols
99
ms.topic: module
1010
ms.service: azure-databricks
1111
ai-usage: ai-generated
12-
title: Design and implement data modeling
12+
title: Design and implement data modeling with Azure Databricks
1313
summary: Effective data modeling forms the foundation of a performant and maintainable data platform. This module explores how to design ingestion logic, select appropriate tools and table formats, implement partitioning schemes, manage slowly changing dimensions, choose appropriate data granularity, and optimize table performance through clustering strategies in Azure Databricks with Unity Catalog.
1414
abstract: |
1515
By the end of this module, you'll be able to:

learn-pr/wwl-databricks/design-implement-data-pipelines/index.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
### YamlMime:Module
22
uid: learn.wwl.design-implement-data-pipelines
33
metadata:
4-
title: Design and Implement Data Pipelines
4+
title: Design and Implement Data Pipelines with Azure Databricks
55
description: Learn how to design and implement robust data pipelines using Lakehouse architecture principles, medallion architecture, and Lakeflow Declarative Pipelines in Azure Databricks.
66
ms.date: 12/07/2025
77
author: weslbo
88
ms.author: wedebols
99
ms.topic: module
1010
ms.service: azure-databricks
1111
ai-usage: ai-generated
12-
title: Design and implement data pipelines
12+
title: Design and implement data pipelines with Azure Databricks
1313
summary: Learn to design and implement robust data pipelines in Azure Databricks using notebooks and Lakeflow Declarative Pipelines, covering orchestration, error handling, and task logic.
1414
abstract: |
1515
By the end of this module, you'll be able to:

learn-pr/wwl-databricks/implement-lakeflow-jobs/index.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
### YamlMime:Module
22
uid: learn.wwl.implement-lakeflow-jobs
33
metadata:
4-
title: Implement Lakeflow Jobs
4+
title: Implement Lakeflow Jobs with Azure Databricks
55
description: Learn how to create, configure, schedule, and monitor Lakeflow Jobs in Azure Databricks to automate your data pipelines.
66
ms.date: 12/07/2025
77
author: weslbo
88
ms.author: wedebols
99
ms.topic: module
1010
ms.service: azure-databricks
1111
ai-usage: ai-generated
12-
title: Implement Lakeflow Jobs
12+
title: Implement Lakeflow Jobs with Azure Databricks
1313
summary: This module guides you through the process of implementing Lakeflow Jobs in Azure Databricks. You will learn how to create jobs, configure triggers and schedules, set up alerts, and manage automatic restarts to ensure reliable data pipeline execution.
1414
abstract: |
1515
By the end of this module, you'll be able to:

learn-pr/wwl-databricks/implement-manage-data-quality-constraints-unity-catalog/index.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
### YamlMime:Module
22
uid: learn.wwl.implement-manage-data-quality-constraints-unity-catalog
33
metadata:
4-
title: Implement and Manage Data Quality Constraints
4+
title: Implement and Manage Data Quality Constraints with Azure Databricks
55
description: Learn how to implement and manage data quality constraints in Azure Databricks using Unity Catalog, including validation checks, schema enforcement, and pipeline expectations.
66
ms.date: 12/07/2025
77
author: weslbo
88
ms.author: wedebols
99
ms.topic: module
1010
ms.service: azure-databricks
1111
ai-usage: ai-generated
12-
title: Implement and manage data quality constraints
12+
title: Implement and manage data quality constraints with Azure Databricks
1313
summary: This module explores strategies for maintaining high data quality in Azure Databricks. You will learn how to implement validation checks, enforce schemas, manage schema drift, and use pipeline expectations to ensure data integrity throughout your data pipelines.
1414
abstract: |
1515
By the end of this module, you'll be able to:

learn-pr/wwl-databricks/select-and-configure-compute/includes/3-configure-compute-performance.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Configuring compute resources involves balancing performance requirements with c
44

55
Compute performance depends on three key factors working together. Each factor influences how efficiently your workload runs and how much it costs.
66

7-
![Diagram showing the relationship between cores, memory and storage.](../media/cores-memory-storage.svg)
7+
![Diagram showing the relationship between cores, memory and storage.](../media/cores-memory-storage.png)
88

99
**Total executor cores** determine the maximum parallelism available for processing data. More cores allow Spark to process more tasks simultaneously. A cluster with 8 workers, each having 4 cores, provides 32 cores total for parallel processing.
1010

@@ -64,7 +64,7 @@ Enable **decommissioning** when using spot instances to reduce task failures. Wh
6464

6565
Instance pools maintain a set of idle instances ready for immediate use, reducing cluster startup time from minutes to seconds.
6666

67-
![Diagram explaining Azure Databricks Instance Pools usage.](../media/instance-pool-management.svg)
67+
![Diagram explaining Azure Databricks Instance Pools usage.](../media/instance-pool-management.png)
6868

6969
Configure the **minimum idle instances** to match your typical concurrent cluster needs. If you regularly run three notebooks simultaneously, maintain at least three idle instances. These instances remain available even when not in use, providing instant cluster startup.
7070

learn-pr/wwl-databricks/select-and-configure-compute/includes/4-configure-compute-feature-settings.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Compute features determine the functional capabilities available to your workloa
44

55
**Photon** is a query execution engine that replaces traditional Spark components with optimized native code. When you enable Photon, your compute resource uses this accelerated engine for SQL queries and DataFrame operations.
66

7-
![Diagram showing a decision tree when to use Photon acceleration.](../media/spark-photon-decision.svg)
7+
![Diagram showing a decision tree when to use Photon acceleration.](../media/spark-photon-decision.png)
88

99
With Photon enabled, queries that involve **complex transformations** run faster. Operations like **joins**, **aggregations**, and **scans** across large tables benefit most from Photon's optimization. Workloads that frequently access disk, process wide tables, or repeatedly transform data also see significant performance gains. For example, a data analyst running hourly aggregation queries across millions of rows will experience faster query completion times with Photon enabled.
1010

learn-pr/wwl-databricks/select-and-configure-compute/includes/5-install-libraries-for-compute.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,4 +77,4 @@ To configure the allowlist, metastore admins use Catalog Explorer, selecting the
7777

7878
Different library installation methods suit different scenarios. The following diagram illustrates a decision flow to help you select the appropriate installation approach:
7979

80-
![Diagram showing the different library installation methods.](../media/library-installation.svg)
80+
![Diagram showing the different library installation methods.](../media/library-installation.png)

learn-pr/wwl-databricks/select-and-configure-compute/includes/6-configure-compute-access-perms.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ This **permission scoping** enables secure group collaboration on workloads that
4444

4545
The following diagram illustrates how to set up dedicated group access:
4646

47-
![Diagram illustrating how to set up dedicated group access.](../media/group-workspace-setup.svg)
47+
![Diagram illustrating how to set up dedicated group access.](../media/group-workspace-setup.png)
4848

4949
To create a compute resource dedicated to a group, you must enable Unity Catalog on your workspace and use Databricks Runtime 15.4 or above. The assigned group needs `CAN MANAGE` permission on a workspace folder where members can store notebooks, MLflow experiments, and other workspace artifacts used on the group cluster.
5050

learn-pr/wwl-databricks/select-and-configure-compute/index.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
### YamlMime:Module
22
uid: learn.wwl.select-and-configure-compute
33
metadata:
4-
title: Select and Configure Compute
4+
title: Select and Configure Compute in Azure Databricks
55
description: Learn how to select and configure appropriate compute resources in Azure Databricks including serverless, classic compute, SQL warehouses, and job clusters. Master performance tuning, access control, and library management.
66
ms.date: 12/07/2025
77
author: weslbo
88
ms.author: wedebols
99
ms.topic: module
1010
ms.service: azure-databricks
1111
ai-usage: ai-generated
12-
title: Select and Configure Compute
12+
title: Select and Configure Compute in Azure Databricks
1313
summary: Azure Databricks provides multiple compute options optimized for different workloads. This module explores how to choose the right compute type, configure performance settings, manage access permissions, and install libraries. You'll learn when to use serverless versus classic compute, how to optimize clusters for cost and performance, and best practices for securing compute resources.
1414
abstract: |
1515
By the end of this module, you'll be able to:
158 KB
Loading

0 commit comments

Comments
 (0)