Skip to content

Commit 82ff537

Browse files
committed
updated units
1 parent 4ef23fb commit 82ff537

5 files changed

Lines changed: 68 additions & 5 deletions

File tree

learn-pr/wwl-databricks/implement-lakeflow-jobs/2-create-lakeflow-job.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ title: Create job setup and configuration
44
metadata:
55
title: Create Job Setup and Configuration
66
description: Learn how to create and configure a Lakeflow Job in Azure Databricks, including setting up tasks, selecting compute resources, organizing task dependencies, and configuring job access permissions.
7-
ms.date: 01/14/2026
7+
ms.date: 01/15/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit
1111
ai-usage: ai-generated
12-
durationInMinutes: 9
12+
durationInMinutes: 11
1313
content: |
1414
[!include[](includes/2-create-lakeflow-job.md)]

learn-pr/wwl-databricks/implement-lakeflow-jobs/4-schedule-job.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ title: Schedule a job
44
metadata:
55
title: Schedule a Job
66
description: Learn how to schedule Lakeflow Jobs in Azure Databricks using simple intervals or advanced cron expressions to automate your data pipelines.
7-
ms.date: 12/07/2025
7+
ms.date: 01/15/2026
88
author: weslbo
99
ms.author: wedebols
1010
ms.topic: unit
1111
ai-usage: ai-generated
12-
durationInMinutes: 6
12+
durationInMinutes: 9
1313
content: |
1414
[!include[](includes/4-schedule-job.md)]

learn-pr/wwl-databricks/implement-lakeflow-jobs/includes/2-create-lakeflow-job.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,4 +130,28 @@ To configure permissions, navigate to **Jobs & Pipelines**, select your job, ope
130130

131131
When a job runs, it executes with the job owner's permissions or the configured service principal's permissions—not the triggering user's permissions. For production jobs, grant `CAN MANAGE` to the pipeline team, `CAN RUN` to users who need manual execution, and `CAN VIEW` to stakeholders requiring visibility.
132132

133+
## Configure run identity and Unity Catalog access
134+
135+
When your job accesses Unity Catalog objects—such as tables, views, or volumes—the job's **run identity** must have the required Unity Catalog privileges. This is a critical prerequisite before configuring any job that reads from or writes to Unity Catalog-managed data.
136+
137+
The run identity is the principal whose permissions Unity Catalog evaluates during job execution. By default, jobs run as the **job owner** (the user who created the job). For production workloads, you can configure a **service principal** as the run identity to avoid dependency on individual user accounts.
138+
139+
Before creating your job, verify that the run identity has the necessary privileges:
140+
141+
| Operation | Required Unity Catalog privilege |
142+
| --------- | -------------------------------- |
143+
| Read from a table | `SELECT` on the table |
144+
| Write to a table | `MODIFY` on the table |
145+
| Create tables in a schema | `CREATE TABLE` and `USE SCHEMA` on the schema |
146+
| Access a volume | `READ VOLUME` or `WRITE VOLUME` on the volume |
147+
148+
To grant privileges to a service principal or user, use SQL commands like:
149+
150+
```sql
151+
GRANT SELECT, MODIFY ON TABLE catalog.schema.table TO `service-principal-id`;
152+
GRANT USE SCHEMA ON SCHEMA catalog.schema TO `service-principal-id`;
153+
```
154+
155+
If the run identity lacks the required privileges, the job fails at runtime with an authorization error—even if the job configuration itself is valid. Always verify Unity Catalog access before scheduling production jobs.
156+
133157
With your job created, tasks configured, dependencies set, and permissions assigned, you're ready to run your workflow. The next step is understanding how to monitor job execution and handle run outcomes.

learn-pr/wwl-databricks/implement-lakeflow-jobs/includes/4-schedule-job.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,45 @@ Consider these factors when choosing a time zone:
8585
> [!TIP]
8686
> For jobs that must run at exact intervals regardless of local time changes, always use UTC.
8787
88+
## Control concurrent job runs
89+
90+
When scheduled jobs take longer than expected, a new run might start before the previous one finishes. This overlap can cause data corruption, duplicate processing, or resource contention. Azure Databricks provides concurrency settings to control this behavior.
91+
92+
### Configure maximum concurrent runs
93+
94+
The **Maximum concurrent runs** setting limits how many instances of the same job can execute simultaneously. By default, jobs allow multiple concurrent runs. For jobs that must not overlap—such as those writing to the same tables—set this value to **1**.
95+
96+
To configure maximum concurrent runs:
97+
98+
1. Open your job in the **Jobs & Pipelines** workspace UI.
99+
2. In the **Job details** panel, locate the **Maximum concurrent runs** setting.
100+
3. Set the value to control how many runs can execute at once.
101+
102+
When a new run is triggered but the maximum concurrent runs limit is reached, Azure Databricks must decide what to do with the incoming run.
103+
104+
### Configure queue behavior for overlapping runs
105+
106+
When concurrent runs exceed your configured limit, you choose how the scheduler handles the new run:
107+
108+
| Behavior | Description | Use case |
109+
|----------|-------------|----------|
110+
| **Queue the run** | The new run waits until a slot becomes available, then executes | Jobs that must eventually run—no triggers should be missed |
111+
| **Cancel the run** | The new run is immediately canceled | Jobs where stale triggers are not valuable |
112+
| **Skip the run** | Similar to cancel—the run doesn't execute | Jobs where missing occasional runs is acceptable |
113+
114+
For most data pipelines, **queue the run** ensures that all scheduled executions eventually complete. This approach prevents data gaps when a job occasionally runs longer than its schedule interval.
115+
116+
Consider a job scheduled to run every hour. If a run takes 75 minutes to complete, the next scheduled trigger arrives while the job is still running. With concurrency set to 1 and queue enabled:
117+
118+
1. The first run continues processing.
119+
2. The second run enters the queue.
120+
3. When the first run completes, the queued run starts immediately.
121+
122+
This pattern ensures sequential, non-overlapping execution while preserving all scheduled runs.
123+
124+
> [!NOTE]
125+
> Queued runs consume no compute resources while waiting. They only start when a concurrent slot becomes available.
126+
88127
## Scheduling considerations for production workloads
89128

90129
The Azure Databricks job scheduler handles most scenarios reliably, but it's not designed for low-latency requirements. Network conditions or cloud service issues can occasionally delay job starts by several minutes. When service recovers, scheduled jobs run immediately.

learn-pr/wwl-databricks/implement-lakeflow-jobs/index.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ uid: learn.wwl.implement-lakeflow-jobs
33
metadata:
44
title: Implement Lakeflow Jobs with Azure Databricks
55
description: Learn how to create, configure, schedule, and monitor Lakeflow Jobs in Azure Databricks to automate your data pipelines.
6-
ms.date: 01/14/2026
6+
ms.date: 01/15/2026
77
author: weslbo
88
ms.author: wedebols
99
ms.topic: module

0 commit comments

Comments
 (0)