Skip to content

Commit 43e4190

Browse files
authored
Merge pull request #52958 from weslbo/images-implement-lakeflow-jobs
Added images for implement-lake flow-jobs
2 parents 454ccbc + d500cdf commit 43e4190

22 files changed

Lines changed: 34 additions & 0 deletions

learn-pr/wwl-databricks/implement-lakeflow-jobs/includes/2-create-lakeflow-job.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ To create a new job in Azure Databricks:
2323
3. Enter a descriptive name for your job.
2424
4. Configure your first task by specifying the **Task name** and selecting the **Type** (such as Notebook, Python script, or SQL).
2525

26+
:::image type="content" source="../media/2-create-lakeflow-job.png" alt-text="Screenshot of the lakeflow editor." lightbox="../media/2-create-lakeflow-job.png":::
27+
2628
The task type determines which configuration options appear. For a notebook task, you specify the notebook path and any parameters. For a SQL task, you select a query and SQL warehouse. The following table summarizes common task types and their configuration requirements:
2729

2830
| Task type | Key configuration | Compute options |
@@ -44,6 +46,8 @@ Tasks that run code (notebooks, Python scripts, SQL files) need a source locatio
4446

4547
**DBFS/ADLS** (for Python scripts) allows you to reference files stored in volumes or cloud storage. Provide the full URI, such as `abfss://[email protected]/path/script.py`.
4648

49+
:::image type="content" source="../media/2-configure-lakeflow-task.png" alt-text="Screenshot of the lakeflow editor task configuration." lightbox="../media/2-configure-lakeflow-task.png":::
50+
4751
## Configure compute resources
4852

4953
Each task needs compute resources to execute. Azure Databricks offers several compute options optimized for different workloads.
@@ -94,6 +98,8 @@ To add job parameters:
9498
1. In the **Job details** panel, locate the **Parameters** section.
9599
2. Select **Add** and enter a key-value pair.
96100

101+
:::image type="content" source="../media/2-add-parameters.png" alt-text="Screenshot of the lakeflow editor add parameter section." lightbox="../media/2-add-parameters.png":::
102+
97103
Tasks access parameters differently based on their type. In notebooks, use `dbutils.widgets.get("parameter_name")` to retrieve parameter values. Python scripts receive parameters as command-line arguments.
98104

99105
You can also reference dynamic values in parameters. For example, `{{job.trigger.time.iso_date}}` inserts the trigger date, useful for processing data based on when the job runs.

learn-pr/wwl-databricks/implement-lakeflow-jobs/includes/3-configure-job-triggers.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ In this unit, you learn how to configure event-based triggers that automate your
66

77
Lakeflow Jobs supports several trigger types that determine when your jobs run. While scheduled triggers run at fixed times, event-based triggers respond to data changes in your environment.
88

9+
:::image type="content" source="../media/3-configure-trigger.png" alt-text="Screenshot of the lakeflow job trigger dialog." lightbox="../media/3-configure-trigger.png":::
10+
911
| Trigger type | Behavior |
1012
|--------------|----------|
1113
| **Table update** | Runs when monitored tables receive new data or modifications |
@@ -64,6 +66,8 @@ To configure a file arrival trigger, specify the storage location using the full
6466
/Volumes/mycatalog/myschema/myvolume/incoming/
6567
```
6668

69+
:::image type="content" source="../media/3-configure-file-arrival-trigger.png" alt-text="Screenshot of the lakeflow file arrival trigger dialog." lightbox="../media/3-configure-file-arrival-trigger.png":::
70+
6771
Before creating a file arrival trigger, verify you have:
6872

6973
- A workspace with Unity Catalog enabled
@@ -83,6 +87,8 @@ Continuous triggers keep your job running by starting a new run immediately afte
8387

8488
When you configure a continuous trigger, your job enters a perpetual cycle: complete a run, start the next run. If a run fails, the trigger still starts a new run, making continuous jobs resilient to transient errors.
8589

90+
:::image type="content" source="../media/3-configure-continuous-trigger.png" alt-text="Screenshot of the lakeflow continuous trigger dialog." lightbox="../media/3-configure-continuous-trigger.png":::
91+
8692
Continuous triggers work well for:
8793

8894
- Processing streaming data where low latency matters

learn-pr/wwl-databricks/implement-lakeflow-jobs/includes/5-configure-job-alerts.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ Supported system destinations include:
3131

3232
Before you can use system destinations, a workspace administrator must configure them. Navigate to **Admin Settings** and select **Notifications** to create destinations. Each destination requires appropriate credentials—webhook URLs for Slack and Teams, integration keys for PagerDuty.
3333

34+
:::image type="content" source="../media/5-create-new-destination.png" alt-text="Screenshot of the admin notification settings." lightbox="../media/5-create-new-destination.png":::
35+
3436
> [!TIP]
3537
> Use different credentials for each configured destination. If one third-party endpoint is compromised, you can revoke its access without affecting other notification destinations.
3638
@@ -48,6 +50,8 @@ To configure job-level notifications, follow these steps:
4850
6. Select the event types you want to trigger notifications: **Start**, **Success**, **Failure**, **Duration warning**, or **Streaming backlog**.
4951
7. Select **Save** when you finish adding notifications.
5052

53+
:::image type="content" source="../media/5-create-job-notification.png" alt-text="Screenshot showing job notifications dialog." lightbox="../media/5-create-job-notification.png":::
54+
5155
Job-level notifications apply to the overall job run. However, job-level notifications aren't sent when individual tasks fail and retry. Consider this behavior when designing your notification strategy.
5256

5357
## Configure task-level notifications
@@ -61,6 +65,8 @@ To add task notifications:
6165
3. Configure the destination and event types just as you would for job notifications.
6266
4. Save the task configuration.
6367

68+
:::image type="content" source="../media/5-create-task-notification.png" alt-text="Screenshot showing task-level notifications." lightbox="../media/5-create-task-notification.png":::
69+
6470
Task notifications are particularly valuable when your job contains multiple independent tasks. If Task A fails, you receive an immediate notification rather than waiting for the entire job to complete or fail.
6571

6672
By default, Azure Databricks retries failed tasks three times. If you don't want notifications for every retry attempt, select **Mute notifications until the last retry**. This reduces noise while still alerting you when a task ultimately fails.
@@ -69,6 +75,8 @@ By default, Azure Databricks retries failed tasks three times. If you don't want
6975

7076
Alert fatigue undermines the value of notifications. When teams receive too many alerts, they start ignoring them—including the critical ones. Apply these strategies to keep your alerts meaningful.
7177

78+
:::image type="content" source="../media/5-alert-fatigue.png" alt-text="Diagram explaining alert fatigue." border="false" lightbox="../media/5-alert-fatigue.png":::
79+
7280
**Filter out skipped and canceled runs**: When you cancel a job or a run gets skipped due to concurrent run limits, you might not need a notification. Select **Mute notifications for skipped runs** or **Mute notifications for canceled runs** to suppress these.
7381

7482
**Use duration warnings strategically**: Rather than alerting on every long-running job, set duration thresholds based on historical performance. A job that usually takes 30 minutes might warrant a warning at 45 minutes—not at 35.
@@ -103,6 +111,8 @@ If you require specific formatting for notifications, webhooks let you control t
103111

104112
When your Databricks jobs are orchestrated by Azure Data Factory (ADF), you have additional monitoring options. ADF provides visual monitoring in the Azure portal where you can track pipeline runs, activity status, and execution duration.
105113

114+
:::image type="content" source="../media/5-azure-data-factory-pipeline-runs.png" alt-text="Screenshot of Azure Databricks list view for monitoring pipeline runs." lightbox="../media/5-azure-data-factory-pipeline-runs.png":::
115+
106116
ADF also supports creating alerts on metrics such as:
107117

108118
- Pipeline run status (failed, succeeded, canceled)

learn-pr/wwl-databricks/implement-lakeflow-jobs/includes/6-configure-job-automatic-restarts.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,12 @@ To set a retry policy for a task:
2424
3. In the task configuration panel, select **+ Add** next to **Retries**
2525
4. Specify the number of retry attempts and the interval between retries
2626

27+
:::image type="content" source="../media/6-configure-task-level-retry-policy.png" alt-text="Screenshot of the retry policy dialog." lightbox="../media/6-configure-task-level-retry-policy.png":::
28+
2729
The **retry interval** is calculated in milliseconds between the start of the failed run and the subsequent retry run. For example, if you set an interval of 60,000 milliseconds (1 minute) and your task took 30 seconds before failing, the next retry starts 30 seconds after the failure.
2830

31+
:::image type="content" source="../media/6-task-execution-flow.png" alt-text="Diagram explaining task execution flow." border="false" lightbox="../media/6-task-execution-flow.png":::
32+
2933
> [!TIP]
3034
> Set reasonable retry limits—typically 1 to 3 retries for most workloads. Avoid configuring unlimited retries, as a persistent failure will waste resources without resolving the underlying issue.
3135
@@ -43,12 +47,16 @@ To configure a job to run in continuous mode:
4347
4. Optionally, select a **Task retry mode**—choose **On failure** to retry failed tasks within a job, or **Never** to only retry at the job level
4448
5. Select **Save**
4549

50+
:::image type="content" source="../media/6-configure-continuous-job-retry-mode.png" alt-text="Screenshot of the continuous job task retry mode dialog." lightbox="../media/6-configure-continuous-job-retry-mode.png":::
51+
4652
The **exponential backoff algorithm** works as follows:
4753

4854
1. When consecutive failures exceed a threshold, the job waits before the next retry
4955
2. Each subsequent failure increases the wait period up to a maximum set by the system
5056
3. If a run completes successfully or runs without failure for a threshold period, the **backoff sequence resets**
5157

58+
:::image type="content" source="../media/6-exponential-backoff-algorithm.png" alt-text="Diagram explaining the exponential backoff pattern." border="false" lightbox="../media/6-exponential-backoff-algorithm.png":::
59+
5260
> [!NOTE]
5361
> Continuous jobs don't support task dependencies or task-level retry policies in the traditional sense. Instead, set the **Task retry mode** to control whether failed tasks retry before triggering a new job run.
5462
@@ -58,13 +66,17 @@ You can monitor continuous jobs in the exponential backoff state through the **J
5866

5967
A task that hangs indefinitely blocks downstream processing and wastes compute resources. **Timeout thresholds** terminate unresponsive tasks so your job can either retry or fail cleanly.
6068

69+
:::image type="content" source="../media/6-timeout-threshold.png" alt-text="Diagram explaining timout behavior." border="false" lightbox="../media/6-timeout-threshold.png":::
70+
6171
To configure timeout thresholds:
6272

6373
1. In the task configuration panel, select **Metric thresholds**
6474
2. Select **Run duration** in the **Metric** dropdown
6575
3. Enter a duration in the **Warning** field to trigger a notification when the task exceeds expected completion time
6676
4. Enter a duration in the **Timeout** field to terminate the task if it exceeds maximum completion time
6777

78+
:::image type="content" source="../media/6-configure-metric-thresholds.png" alt-text="Screenshot of the metric thresholds dialog." lightbox="../media/6-configure-metric-thresholds.png":::
79+
6880
When a task times out, Azure Databricks sets its status to "Timed Out" and handles it according to your retry policy. If retries remain, the task restarts. If all retries are exhausted, the task fails and any dependent tasks are affected based on your job's dependency configuration.
6981

7082
## Combine automatic restarts with notifications
57.7 KB
Loading
535 KB
Loading
1.13 MB
Loading
230 KB
Loading
373 KB
Loading
645 KB
Loading

0 commit comments

Comments
 (0)