You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/select-and-configure-compute/5-install-libraries-for-compute.yml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ title: Install libraries for compute
4
4
metadata:
5
5
title: Install Libraries for Compute
6
6
description: Learn how to install libraries on Azure Databricks compute resources using package repositories, workspace files, Unity Catalog volumes, and init scripts.
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/select-and-configure-compute/includes/2-choose-appropriate-compute-type.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -98,7 +98,7 @@ Different compute types suit different scenarios. The following table compares k
98
98
99
99
Start your decision-making process by identifying your workload characteristics. The following diagram illustrates a decision flow to help you select the appropriate compute type:
100
100
101
-

101
+

Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/select-and-configure-compute/includes/5-install-libraries-for-compute.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,10 +29,10 @@ Maven libraries require **coordinates** in the format `groupId:artifactId:versio
29
29
30
30
For R packages from CRAN, provide the package name. Unlike Python and Java libraries, CRAN installations always pull the latest version from the configured mirror. To pin specific R package versions, you need to store the package files in workspace files or volumes instead of installing from CRAN.
31
31
32
-
With clusters configured in **standard access mode**, Maven coordinates and JAR file paths require **allow list approval** before installation. This security measure ensures admins review and approve libraries that run on shared compute resources.
32
+
With clusters configured in **standard access mode**, Maven coordinates and JAR file paths require **`allowlist` approval** before installation. This security measure ensures admins review and approve libraries that run on shared compute resources.
33
33
34
34
> [!NOTE]
35
-
> To learn more about configuring and managing allow lists for libraries, see the [documentation](/azure/databricks/data-governance/unity-catalog/manage-privileges/allowlist).
35
+
> To learn more about configuring and managing `allowlists` for libraries, see the [documentation](/azure/databricks/data-governance/unity-catalog/manage-privileges/allowlist).
36
36
37
37
## Install libraries from files
38
38
@@ -50,15 +50,15 @@ Unity Catalog volumes offer enhanced security and governance for library storage
50
50
51
51
Python **requirements.txt files** work with both workspace files and volumes in Databricks Runtime 15.0 and above. These files let you define multiple package dependencies in a single file, making it easier to maintain consistent environments across clusters. Upload the requirements.txt file and install it just like any other library—Azure Databricks automatically installs all listed packages.
52
52
53
-
For clusters with standard access mode, you must add library file paths to the allow list before installation. This applies to both workspace files and volumes, ensuring admins approve the libraries used on shared compute.
53
+
For clusters with standard access mode, you must add library file paths to the `allowlist` before installation. This applies to both workspace files and volumes, ensuring admins approve the libraries used on shared compute.
54
54
55
55
## Use init scripts for advanced configuration
56
56
57
57
**Init scripts** run shell commands during **cluster startup**, before the Spark driver and executors start. While Databricks **doesn't recommend** using init scripts for library installation—cluster-scoped libraries provide a better approach—init scripts prove useful for system-level **configuration** that libraries can't handle.
58
58
59
59
You might use init scripts to install system packages with `apt-get`, configure environment variables, or set up monitoring agents. For example, an init script could install a specialized database driver that requires system libraries, then configure connection parameters through environment variables. The script runs every time the cluster starts, ensuring your configuration persists across restarts.
60
60
61
-
Store init scripts in Unity Catalog volumes for clusters running Databricks Runtime 13.3 LTS and above. Create a shell script file, upload it to a volume, then configure the cluster to run the script by specifying its path like `/Volumes/main/engineering/scripts/setup.sh`. For standard access mode, add the init script path to the allow list before configuring the cluster.
61
+
Store init scripts in Unity Catalog volumes for clusters running Databricks Runtime 13.3 LTS and above. Create a shell script file, upload it to a volume, then configure the cluster to run the script by specifying its path like `/Volumes/main/engineering/scripts/setup.sh`. For standard access mode, add the init script path to the `allowlist` before configuring the cluster.
62
62
63
63
Init scripts execute sequentially in the order you specify. If any script returns a non-zero exit code, the cluster fails to start. This failure protection prevents clusters from running with incomplete or incorrect configuration. You can troubleshoot failed init scripts by configuring cluster log delivery and examining the init script logs.
64
64
@@ -68,13 +68,13 @@ Consider init scripts as a last resort for configuration needs that cluster-scop
68
68
69
69
Clusters configured with **standard access mode** provide the strongest security and isolation in Azure Databricks. This mode requires explicit approval for libraries and init scripts to prevent unauthorized code execution on shared compute resources.
70
70
71
-
Before installing Maven libraries or JAR files on standard access mode clusters, a **metastore admin** must add them to the allowlist. Maven coordinates go on the allowlist using the format `groupId:artifactId:version`. You can allowlist all versions of a library with `groupId:artifactId`, or all artifacts in a group with just `groupId`. For JAR files stored in volumes or object storage, allowlist the file path or directory path.
71
+
Before installing Maven libraries or JAR files on standard access mode clusters, a **metastore admin** must add them to the `allowlist`. Maven coordinates go on the `allowlist` using the format `groupId:artifactId:version`. You can `allowlist` all versions of a library with `groupId:artifactId`, or all artifacts in a group with just `groupId`. For JAR files stored in volumes or object storage, `allowlist` the file path or directory path.
72
72
73
-
Init scripts require separate allowlist entries even if stored in the same location as JAR files. When allowlisting a path, Azure Databricks uses prefix matching—adding `/Volumes/prod-libraries/` to the allowlist permits all files and subdirectories within that location. Include a trailing slash to prevent unintended prefix matches at the directory level.
73
+
Init scripts require separate `allowlist` entries even if stored in the same location as JAR files. When allow listing a path, Azure Databricks uses prefix matching—adding `/Volumes/prod-libraries/` to the `allowlist` permits all files and subdirectories within that location. Include a trailing slash to prevent unintended prefix matches at the directory level.
74
74
75
-
The allowlist only grants permission to use a path for library or init script installation. You still need appropriate data access permissions. For volumes, the installer identity must have `READ VOLUME` permission. For standard access mode, the cluster owner's identity validates these permissions during library installation.
75
+
The `allowlist` only grants permission to use a path for library or init script installation. You still need appropriate data access permissions. For volumes, the installer identity must have `READ VOLUME` permission. For standard access mode, the cluster owner's identity validates these permissions during library installation.
76
76
77
-
To configure the allowlist, metastore admins use Catalog Explorer, selecting the metastore settings and navigating to the **Allowed JARs/Init Scripts** section. This centralized control ensures that security teams can review and approve all libraries used across the organization's compute resources, maintaining governance without blocking productivity.
77
+
To configure the `allowlist`, metastore admins use Catalog Explorer, selecting the metastore settings and navigating to the **Allowed JARs/Init Scripts** section. This centralized control ensures that security teams can review and approve all libraries used across the organization's compute resources, maintaining governance without blocking productivity.
78
78
79
79

0 commit comments