You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/implement-lakeflow-jobs/2-create-lakeflow-job.yml
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,11 @@ title: Create job setup and configuration
4
4
metadata:
5
5
title: Create Job Setup and Configuration
6
6
description: Learn how to create and configure a Lakeflow Job in Azure Databricks, including setting up tasks, selecting compute resources, organizing task dependencies, and configuring job access permissions.
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/implement-lakeflow-jobs/4-schedule-job.yml
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,11 @@ title: Schedule a job
4
4
metadata:
5
5
title: Schedule a Job
6
6
description: Learn how to schedule Lakeflow Jobs in Azure Databricks using simple intervals or advanced cron expressions to automate your data pipelines.
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/implement-lakeflow-jobs/includes/2-create-lakeflow-job.md
+24Lines changed: 24 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -130,4 +130,28 @@ To configure permissions, navigate to **Jobs & Pipelines**, select your job, ope
130
130
131
131
When a job runs, it executes with the job owner's permissions or the configured service principal's permissions—not the triggering user's permissions. For production jobs, grant `CAN MANAGE` to the pipeline team, `CAN RUN` to users who need manual execution, and `CAN VIEW` to stakeholders requiring visibility.
132
132
133
+
## Configure run identity and Unity Catalog access
134
+
135
+
When your job accesses Unity Catalog objects—such as tables, views, or volumes—the job's **run identity** must have the required Unity Catalog privileges. This is a critical prerequisite before configuring any job that reads from or writes to Unity Catalog-managed data.
136
+
137
+
The run identity is the principal whose permissions Unity Catalog evaluates during job execution. By default, jobs run as the **job owner** (the user who created the job). For production workloads, you can configure a **service principal** as the run identity to avoid dependency on individual user accounts.
138
+
139
+
Before creating your job, verify that the run identity has the necessary privileges:
140
+
141
+
| Operation | Required Unity Catalog privilege |
142
+
| --------- | -------------------------------- |
143
+
| Read from a table |`SELECT` on the table |
144
+
| Write to a table |`MODIFY` on the table |
145
+
| Create tables in a schema |`CREATE TABLE` and `USE SCHEMA` on the schema |
146
+
| Access a volume |`READ VOLUME` or `WRITE VOLUME` on the volume |
147
+
148
+
To grant privileges to a service principal or user, use SQL commands like:
149
+
150
+
```sql
151
+
GRANTSELECT, MODIFY ON TABLE catalog.schema.table TO `service-principal-id`;
152
+
GRANT USE SCHEMA ON SCHEMA catalog.schema TO `service-principal-id`;
153
+
```
154
+
155
+
If the run identity lacks the required privileges, the job fails at runtime with an authorization error—even if the job configuration itself is valid. Always verify Unity Catalog access before scheduling production jobs.
156
+
133
157
With your job created, tasks configured, dependencies set, and permissions assigned, you're ready to run your workflow. The next step is understanding how to monitor job execution and handle run outcomes.
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/implement-lakeflow-jobs/includes/4-schedule-job.md
+39Lines changed: 39 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,6 +85,45 @@ Consider these factors when choosing a time zone:
85
85
> [!TIP]
86
86
> For jobs that must run at exact intervals regardless of local time changes, always use UTC.
87
87
88
+
## Control concurrent job runs
89
+
90
+
When scheduled jobs take longer than expected, a new run might start before the previous one finishes. This overlap can cause data corruption, duplicate processing, or resource contention. Azure Databricks provides concurrency settings to control this behavior.
91
+
92
+
### Configure maximum concurrent runs
93
+
94
+
The **Maximum concurrent runs** setting limits how many instances of the same job can execute simultaneously. By default, jobs allow multiple concurrent runs. For jobs that must not overlap—such as those writing to the same tables—set this value to **1**.
95
+
96
+
To configure maximum concurrent runs:
97
+
98
+
1. Open your job in the **Jobs & Pipelines** workspace UI.
99
+
2. In the **Job details** panel, locate the **Maximum concurrent runs** setting.
100
+
3. Set the value to control how many runs can execute at once.
101
+
102
+
When a new run is triggered but the maximum concurrent runs limit is reached, Azure Databricks must decide what to do with the incoming run.
103
+
104
+
### Configure queue behavior for overlapping runs
105
+
106
+
When concurrent runs exceed your configured limit, you choose how the scheduler handles the new run:
107
+
108
+
| Behavior | Description | Use case |
109
+
|----------|-------------|----------|
110
+
|**Queue the run**| The new run waits until a slot becomes available, then executes | Jobs that must eventually run—no triggers should be missed |
111
+
|**Cancel the run**| The new run is immediately canceled | Jobs where stale triggers are not valuable |
112
+
|**Skip the run**| Similar to cancel—the run doesn't execute | Jobs where missing occasional runs is acceptable |
113
+
114
+
For most data pipelines, **queue the run** ensures that all scheduled executions eventually complete. This approach prevents data gaps when a job occasionally runs longer than its schedule interval.
115
+
116
+
Consider a job scheduled to run every hour. If a run takes 75 minutes to complete, the next scheduled trigger arrives while the job is still running. With concurrency set to 1 and queue enabled:
117
+
118
+
1. The first run continues processing.
119
+
2. The second run enters the queue.
120
+
3. When the first run completes, the queued run starts immediately.
121
+
122
+
This pattern ensures sequential, non-overlapping execution while preserving all scheduled runs.
123
+
124
+
> [!NOTE]
125
+
> Queued runs consume no compute resources while waiting. They only start when a concurrent slot becomes available.
126
+
88
127
## Scheduling considerations for production workloads
89
128
90
129
The Azure Databricks job scheduler handles most scenarios reliably, but it's not designed for low-latency requirements. Network conditions or cloud service issues can occasionally delay job starts by several minutes. When service recovers, scheduled jobs run immediately.
0 commit comments