Merge pull request #53183 from weslbo/DP-750-updates

v-ccolin · web-flow · commit b05ff0fe2361 · 2026-01-23T09:44:06.000Z
DP-750 updates
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/1-introduction.yml b/learn-pr/wwl-databricks/select-and-configure-compute/1-introduction.yml
@@ -4,7 +4,7 @@ title: Introduction
 metadata:
   title: Introduction
   description: "Introduction to selecting and configuring compute in Azure Databricks"
-  ms.date: 12/07/2025
+  ms.date: 01/23/2026
   author: weslbo
   ms.author: wedebols
   ms.topic: unit
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/2-choose-appropriate-compute-type.yml b/learn-pr/wwl-databricks/select-and-configure-compute/2-choose-appropriate-compute-type.yml
@@ -4,7 +4,7 @@ title: Choose an appropriate compute type
 metadata:
   title: Choose an Appropriate Compute Type
   description: Learn how to choose the right compute type for your Azure Databricks workloads, comparing serverless, classic, SQL warehouses, instance pools, and job compute options.
-  ms.date: 01/14/2026
+  ms.date: 01/23/2026
   author: weslbo
   ms.author: wedebols
   ms.topic: unit
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/3-configure-compute-performance.yml b/learn-pr/wwl-databricks/select-and-configure-compute/3-configure-compute-performance.yml
@@ -4,7 +4,7 @@ title: Configure compute performance
 metadata:
   title: Configure Compute Performance
   description: Learn how to configure Azure Databricks compute performance settings including node types, autoscaling, termination, and instance pools to balance cost and performance.
-  ms.date: 12/07/2025
+  ms.date: 01/23/2026
   author: weslbo
   ms.author: wedebols
   ms.topic: unit
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/4-configure-compute-feature-settings.yml b/learn-pr/wwl-databricks/select-and-configure-compute/4-configure-compute-feature-settings.yml
@@ -4,7 +4,7 @@ title: Configure compute features
 metadata:
   title: Configure Compute Features
   description: Learn how to configure Azure Databricks compute features including Photon acceleration, Databricks runtime versions, and machine learning environments for optimal workload performance.
-  ms.date: 12/07/2025
+  ms.date: 01/23/2026
   author: weslbo
   ms.author: wedebols
   ms.topic: unit
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/5-install-libraries-for-compute.yml b/learn-pr/wwl-databricks/select-and-configure-compute/5-install-libraries-for-compute.yml
@@ -4,7 +4,7 @@ title: Install libraries for compute
 metadata:
   title: Install Libraries for Compute
   description: Learn how to install libraries on Azure Databricks compute resources using package repositories, workspace files, Unity Catalog volumes, and init scripts.
-  ms.date: 01/14/2026
+  ms.date: 01/23/2026
   author: weslbo
   ms.author: wedebols
   ms.topic: unit
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/6-configure-compute-access-perms.yml b/learn-pr/wwl-databricks/select-and-configure-compute/6-configure-compute-access-perms.yml
@@ -4,7 +4,7 @@ title: Configure compute access
 metadata:
   title: Configure Compute Access
   description: Learn how to configure access permissions for Azure Databricks compute resources, including permission levels, access modes, dedicated group access, and workspace-level entitlements.
-  ms.date: 01/14/2026
+  ms.date: 01/23/2026
   author: weslbo
   ms.author: wedebols
   ms.topic: unit
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/7-knowledge-check.yml b/learn-pr/wwl-databricks/select-and-configure-compute/7-knowledge-check.yml
@@ -4,7 +4,7 @@ title: Module assessment
 metadata:
   title: Module assessment
   description: "Knowledge check"
-  ms.date: 12/07/2025
+  ms.date: 01/23/2026
   author: weslbo
   ms.author: wedebols
   ms.topic: unit
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/8-summary.yml b/learn-pr/wwl-databricks/select-and-configure-compute/8-summary.yml
@@ -4,7 +4,7 @@ title: Summary
 metadata:
   title: Summary
   description: "Summary"
-  ms.date: 12/07/2025
+  ms.date: 01/23/2026
   author: weslbo
   ms.author: wedebols
   ms.topic: unit
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/includes/1-introduction.md b/learn-pr/wwl-databricks/select-and-configure-compute/includes/1-introduction.md
@@ -1,3 +1,5 @@
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=c62275dd-5317-463b-9c0c-cf02262d736e]
+
 Every Azure Databricks workload runs on compute resources, but choosing the wrong compute type or configuration leads to unnecessary costs, poor performance, or blocked functionality. **Serverless compute** starts in seconds but doesn't support RDD APIs. **Classic compute** offers complete flexibility but requires more management overhead. **SQL warehouses** excel at analytical queries while **job clusters** optimize for automated workflows. Understanding these differences helps you match compute to workload requirements.
 
 Beyond selecting a compute type, configuration decisions shape how your workload performs. **Node types** determine processing capacity and memory availability. **Autoscaling** balances cost and responsiveness. **Access permissions** control who can use compute resources while **library installations** provide the dependencies your code needs. Each configuration choice affects multiple dimensions—performance, cost, security, and operational complexity.
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/includes/2-choose-appropriate-compute-type.md b/learn-pr/wwl-databricks/select-and-configure-compute/includes/2-choose-appropriate-compute-type.md
@@ -4,6 +4,8 @@ Understanding these options helps you match compute resources to your specific n
 
 ## Serverless compute
 
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=5a6895d9-b264-4f25-a054-e533c73cd9d1]
+
 **Serverless compute** is managed entirely by Azure Databricks. You don't provision or configure infrastructure—Azure Databricks automatically allocates and scales resources based on your workload demands. These resources run in **Databricks' Azure subscription, not yours**, which means no virtual machines or networking components appear in your subscription.
 
 With serverless compute, startup typically takes 2-6 seconds. The platform scales up rapidly when query volume increases and scales down during idle periods to minimize costs. This eliminates the need to estimate capacity or manage cluster configurations.
@@ -21,6 +23,8 @@ However, serverless has limitations. You can't use **RDD APIs** (Resilient Distr
 
 ## Classic compute
 
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=8dca13af-6084-412b-ac4d-7c391769c661]
+
 **Classic compute** gives you full control over cluster configuration. You create, size, and manage compute resources that run directly in **your Azure subscription**, giving you visibility and control over the underlying infrastructure.
 
 Classic compute supports two access modes that determine how users interact with the cluster:
@@ -43,6 +47,8 @@ This compute type fits workloads that need features unavailable in serverless, r
 
 ## SQL warehouses
 
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=e3c301e1-13bc-4c84-a9b2-e82c1571ac15]
+
 **SQL warehouses** are compute resources optimized specifically for SQL queries, analytics, and business intelligence. They come in three types, each with different performance characteristics.
 
 **Serverless SQL warehouses** offer optimal performance and cost efficiency. They start in 2-6 seconds, use Intelligent Workload Management to predict query resource needs, and scale clusters dynamically based on demand. Photon and Predictive IO accelerate query execution. Choose serverless SQL warehouses for most SQL workloads—BI dashboards, ETL jobs, and ad hoc analysis.
@@ -55,6 +61,8 @@ All SQL warehouse types optimize for SQL execution patterns, but serverless offe
 
 ## Instance pools
 
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=3404841b-a2f1-440d-8f94-abe101227b73]
+
 **Instance pools** maintain a set of idle virtual machine instances ready for immediate use. When you create a cluster from a pool, startup time decreases because Databricks allocates instances from the pool instead of requesting new ones from Azure.
 
 Pools reduce startup time from minutes to under a minute in many cases. You configure the minimum number of idle instances to keep warm and the maximum pool capacity. When clusters release instances, those instances return to the pool for reuse.
@@ -67,6 +75,8 @@ Configure pools with spot instances for worker nodes to reduce costs, but use on
 
 ## Job compute
 
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=051cb3b1-3282-4c0a-b5f7-5eb583f401fe]
+
 **Job compute** refers to clusters optimized for automated workflows rather than interactive development. You configure job compute through cluster policies that enforce best practices for production workloads.
 
 Job clusters terminate automatically after completing their tasks, preventing unnecessary costs from idle resources. When you configure a job, you choose between serverless and classic job compute. 
@@ -80,6 +90,8 @@ The Job Compute policy in Azure Databricks offers a template for creating produc
 
 ## Compare compute types
 
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=b0f5251c-2ebe-4dfc-942f-5e3e4252990c]
+
 Different compute types suit different scenarios. The following table compares key characteristics to help you make informed decisions:
 
 | Compute type                | Recommended for                                        | Startup time  | Management overhead                          | Cost efficiency                                       | Key limitation                        |
@@ -98,7 +110,7 @@ Different compute types suit different scenarios. The following table compares k
 
 Start your decision-making process by identifying your workload characteristics. The following diagram illustrates a decision flow to help you select the appropriate compute type:
 
-![Diagram explaining how to choose the right compute type in Azure Databricks.](../media/databricks-compute-selection.svg)
+![Diagram explaining how to choose the right compute type in Azure Databricks.](../media/databricks-compute-selection.png)
 
 Consider these questions:
 
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/includes/3-configure-compute-performance.md b/learn-pr/wwl-databricks/select-and-configure-compute/includes/3-configure-compute-performance.md
@@ -2,6 +2,8 @@ Configuring compute resources involves balancing performance requirements with c
 
 ## Understand compute resource components
 
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=e0c7ff8f-9e08-49dd-8645-140490098a21]
+
 Compute performance depends on three key factors working together. Each factor influences how efficiently your workload runs and how much it costs.
 
 ![Diagram showing the relationship between cores, memory and storage.](../media/cores-memory-storage.png)
@@ -16,6 +18,8 @@ With this understanding of compute components, you can make informed decisions a
 
 ## Configure node types and cluster size
 
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=670b0dcd-299f-459c-8a17-6c21f00a8d80]
+
 Node type selection directly impacts both performance and cost. Different instance families serve different workload characteristics.
 
 **Memory-optimized instances** work well for workloads with large **joins**, **aggregations**, or data that needs to stay in memory. These instances provide more RAM per core, reducing the likelihood of spilling data to disk. Examples include **E-series** VMs, which offer high memory-to-core ratios ideal for in-memory analytics.
@@ -50,6 +54,8 @@ Autoscaling works particularly well with instance pools. Set your minimum worker
 
 Automatic termination prevents idle compute resources from accumulating unnecessary costs while maintaining availability for scheduled workloads.
 
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=58a7d7b4-1091-4b23-8fa7-07dc711f1f12]
+
 When you configure automatic termination, you specify an inactivity period in minutes. If no commands run on the cluster for longer than this period, Azure Databricks terminates the cluster. The cluster configuration remains available for restart when needed.
 
 For interactive workloads like data analysis, set the termination period based on typical session patterns. A 45-minute timeout works well for most use cases, giving data engineers time to review results between queries without leaving clusters idle for hours.
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/includes/4-configure-compute-feature-settings.md b/learn-pr/wwl-databricks/select-and-configure-compute/includes/4-configure-compute-feature-settings.md
@@ -2,6 +2,8 @@ Compute features determine the functional capabilities available to your workloa
 
 ## Enable Photon acceleration
 
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=b0d7b320-44d9-43cd-b7fd-f56b2bdf17fd]
+
 **Photon** is a query execution engine that replaces traditional Spark components with optimized native code. When you enable Photon, your compute resource uses this accelerated engine for SQL queries and DataFrame operations.
 
 ![Diagram showing a decision tree when to use Photon acceleration.](../media/spark-photon-decision.png)
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/includes/5-install-libraries-for-compute.md b/learn-pr/wwl-databricks/select-and-configure-compute/includes/5-install-libraries-for-compute.md
@@ -2,6 +2,8 @@ When you run notebooks and jobs on Azure Databricks compute, you often need thir
 
 Understanding how to install libraries effectively becomes critical as your data engineering workflows grow in complexity. You need to know which installation method to use, where to store library files, and how access modes affect your options.
 
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=99e9c9ea-b4e0-4c5d-9e45-ddec74896dc6]
+
 ## Understand compute-scoped libraries
 
 **Compute-scoped libraries** install on a cluster and become available to all notebooks and jobs that run on that cluster. Unlike **notebook-scoped libraries** that install only for a specific notebook session, compute-scoped libraries persist across cluster restarts and provide a shared environment for all users.
@@ -29,10 +31,10 @@ Maven libraries require **coordinates** in the format `groupId:artifactId:versio
 
 For R packages from CRAN, provide the package name. Unlike Python and Java libraries, CRAN installations always pull the latest version from the configured mirror. To pin specific R package versions, you need to store the package files in workspace files or volumes instead of installing from CRAN.
 
-With clusters configured in **standard access mode**, Maven coordinates and JAR file paths require **allow list approval** before installation. This security measure ensures admins review and approve libraries that run on shared compute resources.
+With clusters configured in **standard access mode**, Maven coordinates and JAR file paths require **`allowlist` approval** before installation. This security measure ensures admins review and approve libraries that run on shared compute resources.
 
 > [!NOTE]
-> To learn more about configuring and managing allow lists for libraries, see the [documentation](/azure/databricks/data-governance/unity-catalog/manage-privileges/allowlist).
+> To learn more about configuring and managing `allowlists` for libraries, see the [documentation](/azure/databricks/data-governance/unity-catalog/manage-privileges/allowlist).
 
 ## Install libraries from files
 
@@ -50,15 +52,15 @@ Unity Catalog volumes offer enhanced security and governance for library storage
 
 Python **requirements.txt files** work with both workspace files and volumes in Databricks Runtime 15.0 and above. These files let you define multiple package dependencies in a single file, making it easier to maintain consistent environments across clusters. Upload the requirements.txt file and install it just like any other library—Azure Databricks automatically installs all listed packages.
 
-For clusters with standard access mode, you must add library file paths to the allow list before installation. This applies to both workspace files and volumes, ensuring admins approve the libraries used on shared compute.
+For clusters with standard access mode, you must add library file paths to the `allowlist` before installation. This applies to both workspace files and volumes, ensuring admins approve the libraries used on shared compute.
 
 ## Use init scripts for advanced configuration
 
 **Init scripts** run shell commands during **cluster startup**, before the Spark driver and executors start. While Databricks **doesn't recommend** using init scripts for library installation—cluster-scoped libraries provide a better approach—init scripts prove useful for system-level **configuration** that libraries can't handle.
 
 You might use init scripts to install system packages with `apt-get`, configure environment variables, or set up monitoring agents. For example, an init script could install a specialized database driver that requires system libraries, then configure connection parameters through environment variables. The script runs every time the cluster starts, ensuring your configuration persists across restarts.
 
-Store init scripts in Unity Catalog volumes for clusters running Databricks Runtime 13.3 LTS and above. Create a shell script file, upload it to a volume, then configure the cluster to run the script by specifying its path like `/Volumes/main/engineering/scripts/setup.sh`. For standard access mode, add the init script path to the allow list before configuring the cluster.
+Store init scripts in Unity Catalog volumes for clusters running Databricks Runtime 13.3 LTS and above. Create a shell script file, upload it to a volume, then configure the cluster to run the script by specifying its path like `/Volumes/main/engineering/scripts/setup.sh`. For standard access mode, add the init script path to the `allowlist` before configuring the cluster.
 
 Init scripts execute sequentially in the order you specify. If any script returns a non-zero exit code, the cluster fails to start. This failure protection prevents clusters from running with incomplete or incorrect configuration. You can troubleshoot failed init scripts by configuring cluster log delivery and examining the init script logs.
 
@@ -68,13 +70,13 @@ Consider init scripts as a last resort for configuration needs that cluster-scop
 
 Clusters configured with **standard access mode** provide the strongest security and isolation in Azure Databricks. This mode requires explicit approval for libraries and init scripts to prevent unauthorized code execution on shared compute resources.
 
-Before installing Maven libraries or JAR files on standard access mode clusters, a **metastore admin** must add them to the allowlist. Maven coordinates go on the allowlist using the format `groupId:artifactId:version`. You can allowlist all versions of a library with `groupId:artifactId`, or all artifacts in a group with just `groupId`. For JAR files stored in volumes or object storage, allowlist the file path or directory path.
+Before installing Maven libraries or JAR files on standard access mode clusters, a **metastore admin** must add them to the `allowlist`. Maven coordinates go on the `allowlist` using the format `groupId:artifactId:version`. You can `allowlist` all versions of a library with `groupId:artifactId`, or all artifacts in a group with just `groupId`. For JAR files stored in volumes or object storage, `allowlist` the file path or directory path.
 
-Init scripts require separate allowlist entries even if stored in the same location as JAR files. When allowlisting a path, Azure Databricks uses prefix matching—adding `/Volumes/prod-libraries/` to the allowlist permits all files and subdirectories within that location. Include a trailing slash to prevent unintended prefix matches at the directory level.
+Init scripts require separate `allowlist` entries even if stored in the same location as JAR files. When allow listing a path, Azure Databricks uses prefix matching—adding `/Volumes/prod-libraries/` to the `allowlist` permits all files and subdirectories within that location. Include a trailing slash to prevent unintended prefix matches at the directory level.
 
-The allowlist only grants permission to use a path for library or init script installation. You still need appropriate data access permissions. For volumes, the installer identity must have `READ VOLUME` permission. For standard access mode, the cluster owner's identity validates these permissions during library installation.
+The `allowlist` only grants permission to use a path for library or init script installation. You still need appropriate data access permissions. For volumes, the installer identity must have `READ VOLUME` permission. For standard access mode, the cluster owner's identity validates these permissions during library installation.
 
-To configure the allowlist, metastore admins use Catalog Explorer, selecting the metastore settings and navigating to the **Allowed JARs/Init Scripts** section. This centralized control ensures that security teams can review and approve all libraries used across the organization's compute resources, maintaining governance without blocking productivity.
+To configure the `allowlist`, metastore admins use Catalog Explorer, selecting the metastore settings and navigating to the **Allowed JARs/Init Scripts** section. This centralized control ensures that security teams can review and approve all libraries used across the organization's compute resources, maintaining governance without blocking productivity.
 
 ![Screenshot of the Add allowed JARs / Init Scripts / Maven Coordinates dialog box.](../media/allow-list.png)
 
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/includes/6-configure-compute-access-perms.md b/learn-pr/wwl-databricks/select-and-configure-compute/includes/6-configure-compute-access-perms.md
@@ -2,6 +2,8 @@ Managing access to compute resources protects sensitive data while enabling coll
 
 Proper access configuration prevents unauthorized users from viewing driver logs containing secrets, controls who can consume compute resources, and ensures teams work efficiently without conflicting infrastructure changes. With **Unity Catalog** enabled, you can assign compute resources to entire groups with automatic permission scoping.
 
+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=9e278d18-d7c4-4f61-9d3a-82ffde775682]
+
 ## Understand compute permission levels
 
 Azure Databricks offers **four permission levels** for compute resources, each granting progressively more capabilities. These permissions operate within your workspace and are distinct from Azure subscription-level access controls.
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/index.yml b/learn-pr/wwl-databricks/select-and-configure-compute/index.yml
diff --git a/learn-pr/wwl-databricks/select-and-configure-compute/media/databricks-compute-selection.png b/learn-pr/wwl-databricks/select-and-configure-compute/media/databricks-compute-selection.png

Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,5 @@`
	`1`	`+>[!VIDEO https://learn-video.azurefd.net/vod/player?id=c62275dd-5317-463b-9c0c-cf02262d736e]`
	`2`	`+`
`1`	`3`	Every Azure Databricks workload runs on compute resources, but choosing the wrong compute type or configuration leads to unnecessary costs, poor performance, or blocked functionality. Serverless compute starts in seconds but doesn't support RDD APIs. Classic compute offers complete flexibility but requires more management overhead. SQL warehouses excel at analytical queries while job clusters optimize for automated workflows. Understanding these differences helps you match compute to workload requirements.
`2`	`4`
`3`	`5`	`Beyond selecting a compute type, configuration decisions shape how your workload performs. Node types determine processing capacity and memory availability. Autoscaling balances cost and responsiveness. Access permissions control who can use compute resources while library installations provide the dependencies your code needs. Each configuration choice affects multiple dimensions—performance, cost, security, and operational complexity.`