You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/govern-unity-catalog-objects/includes/1-introduction.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
Effective data governance requires more than just controlling access—it demands visibility into your data assets, accountability for how they're used, and enforcement mechanisms that scale with your organization. As data platforms grow in complexity, managing table definitions, tracking data lineage, enforcing retention policies, and sharing data securely become increasingly challenging. **Unity Catalog** in Azure Databricks provides the foundation for comprehensive governance that addresses these challenges through centralized metadata management and policy enforcement.
2
2
3
-
When you govern Unity Catalog objects, you work across several interconnected capabilities. You document tables and columns with **comments and tags** that help data consumers discover and understand your assets. You implement **attribute-based access control (ABAC)** using governed tags and policies that automatically enforce fine-grained permissions. You apply **row filtering and column masking** to protect sensitive data while maintaining query functionality. You configure **data retention policies** using Delta Lake's VACUUM and predictive optimization to manage storage costs and meet compliance requirements.
3
+
When you govern Unity Catalog objects, you work across several interconnected capabilities. You document tables and columns with **comments and tags** that help data consumers discover and understand your assets. You implement **attribute-based access control (ABAC)** using governed tags and policies that automatically enforce fine-grained permissions. You apply **row filtering and column masking** to protect sensitive data while maintaining query functionality. You configure **data retention policies** using Delta Lake's `VACUUM` and predictive optimization to manage storage costs and meet compliance requirements.
4
4
5
5
Beyond access control, governance extends to **data lineage tracking** that shows how data flows through your pipelines, enabling impact analysis and troubleshooting. **Audit logging** captures who did what and when, supporting security investigations and regulatory compliance. For external collaboration, **Delta Sharing** lets you share data with partners and customers while maintaining governance controls over what they can access.
@@ -53,6 +53,8 @@ Unity Catalog can automatically generate comments using AI-powered suggestions.
53
53
54
54
For column comments, select **AI generate** above the column list to generate suggestions for all columns.
55
55
56
+
:::image type="content" source="../media/2-use-ai-generated-comments.png" alt-text="Screenshot of AI-generated comments." lightbox="../media/2-use-ai-generated-comments.png":::
57
+
56
58
> [!IMPORTANT]
57
59
> **AI-generated comments** are suggestions based on schema analysis. Always review these comments before saving, as AI models might generate inaccurate descriptions. Don't rely on AI comments for data classification tasks like detecting personally identifiable information (PII).
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/govern-unity-catalog-objects/includes/3-configure-attribute-based-access-control-tags-policies.md
+14-6Lines changed: 14 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,14 +9,16 @@ In this unit, you learn how to use governed tags and ABAC policies to implement
9
9
10
10
Governed tags differ from the standard tags you've already encountered. While standard tags help with organization and discovery, governed tags add enforcement and consistency at the account level. Administrators define governed tags with specific allowed values, and only authorized users can assign them.
11
11
12
+
:::image type="content" source="../media/3-understand-governed-tags.png" alt-text="Diagram explaining standard and governed tags." border="false" lightbox="../media/3-understand-governed-tags.png":::
13
+
12
14
Consider the following differences between tag types:
13
15
14
-
| Aspect | Standard tags | Governed tags |
15
-
|--------|---------------|---------------|
16
-
| Scope | Any key-value pair | Predefined keys and allowed values |
17
-
| Control | Any user with APPLY TAG privilege | Only users with ASSIGN permission |
18
-
| Purpose | Organization and discovery | Policy enforcement and compliance |
19
-
| Visual indicator | None | Lock icon in Catalog Explorer |
Governed tags serve as the foundation for ABAC policies. When you tag a table with `sensitivity=high`, you can then create a policy that masks certain columns for all tables with that tag. This approach scales efficiently because adding the tag to new tables automatically applies the policy.
22
24
@@ -26,6 +28,8 @@ To create a governed tag, select **Catalog** > **Governance** > **Governed Tags*
26
28
27
29
For example, you might create a governed tag with key `pii` and allowed values `ssn`, `email`, and `address`. This ensures consistent classification of personally identifiable information across your organization.
28
30
31
+
:::image type="content" source="../media/3-create-governed-tag.png" alt-text="Screenshot showing the create governed tag dialog box." lightbox="../media/3-create-governed-tag.png":::
32
+
29
33
> [!NOTE]
30
34
> Tag data is stored as plain text. Don't use tag names or values that contain sensitive information.
31
35
@@ -78,6 +82,8 @@ This policy applies the `filter_non_eu` function to any table in the `sales` cat
78
82
79
83
You can also create policies using Catalog Explorer. Select the catalog or schema, choose the **Policies** tab, and select **New policy**. The visual interface guides you through selecting principals, scope, and conditions.
80
84
85
+
:::image type="content" source="../media/3-create-policy.png" alt-text="Screenshot for creating policies using Catalog Explorer." lightbox="../media/3-create-policy.png":::
86
+
81
87
## Create column mask policies
82
88
83
89
Column mask policies control what values users see in specific columns. Like row filters, these policies rely on UDFs to implement the masking logic.
@@ -140,6 +146,8 @@ Policy quotas limit how many policies you can create:
140
146
141
147
Only one row filter can apply to any given table, and only one column mask can apply to any given column. If multiple policies would result in multiple filters or masks, Azure Databricks blocks access and throws an error.
> You must use Databricks Runtime 16.4 or above, or serverless compute, to access tables secured by ABAC policies. Users not subject to the policy can use any runtime.
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/govern-unity-catalog-objects/includes/4-implement-row-filtering-column-masking.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -99,4 +99,6 @@ Both approaches achieve fine-grained security, but they differ in scope:
99
99
-**Row and Column Security** is embedded at the table level and guarantees that restrictions apply everywhere the table is used.
100
100
-**Dynamic Views** provide flexibility: you can create multiple views with different rules for different audiences, while keeping the base table unrestricted.
101
101
102
+
:::image type="content" source="../media/4-choose-between-approaches.png" alt-text="Diagram helping you choose between the two approaches." border="false" lightbox="../media/4-choose-between-approaches.png":::
103
+
102
104
In practice, you use table-level controls when you need strict enforcement, and dynamic views when you want adaptable, shareable abstractions.
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/govern-unity-catalog-objects/includes/5-apply-data-retention-policies.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
Every organization faces the challenge of balancing data availability with compliance requirements and storage costs. When you implement **data retention policies** in Azure Databricks, you ensure that data is kept only as long as necessary while remaining accessible for legitimate business needs. This becomes especially critical when regulations require you to delete personal data upon request.
2
2
3
-
In this unit, you learn how to configure retention settings, use **VACUUM** to remove obsolete data, handle deletion requests for compliance, and automate maintenance with **predictive optimization**.
3
+
In this unit, you learn how to configure retention settings, use **`VACUUM`** to remove obsolete data, handle deletion requests for compliance, and automate maintenance with **predictive optimization**.
4
4
5
5
## Configure Delta Lake retention settings
6
6
@@ -9,7 +9,9 @@ In this unit, you learn how to configure retention settings, use **VACUUM** to r
9
9
| Property | Default | Purpose |
10
10
|----------|---------|---------|
11
11
|`delta.logRetentionDuration`| 30 days | Controls how long transaction log history is kept |
12
-
|`delta.deletedFileRetentionDuration`| 7 days | Determines when VACUUM can remove unreferenced data files |
12
+
|`delta.deletedFileRetentionDuration`| 7 days | Determines when `VACUUM` can remove unreferenced data files |
These settings work together to define your **time travel** capabilities. For example, if you need 30 days of historical data access, you must configure both properties accordingly:
15
17
@@ -24,7 +26,7 @@ SET TBLPROPERTIES (
24
26
> [!IMPORTANT]
25
27
> Increasing retention duration increases storage costs because more data files are preserved. Before adjusting these settings, evaluate your compliance requirements against storage budget constraints.
26
28
>
27
-
> You must set both of these properties to ensure table history is retained for longer duration for tables with frequent VACUUM operations
29
+
> You must set both of these properties to ensure table history is retained for longer duration for tables with frequent `VACUUM` operations
28
30
29
31
To view current retention settings for a table, use the following command:
When you run VACUUM, Delta Lake identifies files associated with versions older than the retention threshold and removes them from storage. After this operation, time travel queries to those older versions fail because the underlying data no longer exists.
49
+
When you run `VACUUM`, Delta Lake identifies files associated with versions older than the retention threshold and removes them from storage. After this operation, time travel queries to those older versions fail because the underlying data no longer exists.
48
50
49
-
For tables with **deletion vectors** enabled, you must also run `REORG TABLE ... APPLY (PURGE)` after deleting records to permanently remove the underlying data. VACUUM alone doesn't remove data marked for deletion by deletion vectors.
51
+
For tables with **deletion vectors** enabled, you must also run `REORG TABLE ... APPLY (PURGE)` after deleting records to permanently remove the underlying data. `VACUUM` by itself does not remove data that deletion vectors have marked for deletion.
50
52
51
53
> [!NOTE]
52
54
> **Deletion vectors** are a storage optimization feature you can enable on Delta Lake tables. By default, when a single row in a data file is deleted, the entire Parquet file containing the record must be rewritten. With deletion vectors enabled, it marks rows as deleted without rewriting the entire file.
@@ -92,11 +94,11 @@ When you delete data in your bronze layer, you must also remove it from silver a
After running deletion operations, execute VACUUM to permanently remove the data files from storage.
97
+
After running deletion operations, execute `VACUUM` to permanently remove the data files from storage.
96
98
97
99
## Automate retention with predictive optimization
98
100
99
-
Manually scheduling VACUUM and OPTIMIZE operations across many tables is time-consuming and error-prone. **Predictive optimization** in Unity Catalog automates these maintenance tasks for managed tables.
101
+
Manually scheduling `VACUUM` and `OPTIMIZE` operations across many tables is time-consuming and error-prone. **Predictive optimization** in Unity Catalog automates these maintenance tasks for managed tables.
100
102
101
103
When enabled, predictive optimization automatically:
102
104
@@ -111,7 +113,7 @@ ALTER SCHEMA sales.customers ENABLE PREDICTIVE OPTIMIZATION;
111
113
```
112
114
113
115
> [!TIP]
114
-
> Before enabling predictive optimization, set your desired `delta.deletedFileRetentionDuration` on tables that require longer retention periods. The default VACUUM retention is 7 days, which might be shorter than your compliance requirements.
116
+
> Before enabling predictive optimization, set your desired `delta.deletedFileRetentionDuration` on tables that require longer retention periods. The default `VACUUM` retention is 7 days, which might be shorter than your compliance requirements.
115
117
116
118
To check whether predictive optimization is enabled for a table:
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/govern-unity-catalog-objects/includes/6-manage-data-lineage-tracking.md
+12-8Lines changed: 12 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,8 @@ In this unit, you learn how to use Catalog Explorer to view and manage data line
6
6
7
7
Unity Catalog automatically captures **runtime data lineage** across queries run on Azure Databricks. This lineage tracking works across all languages and captures relationships down to the **column level**. Lineage data includes the notebooks, jobs, and dashboards that interact with your tables.
8
8
9
+
:::image type="content" source="../media/6-understand-lineage.png" alt-text="Screenshot of Unity Catalog lineage." lightbox="../media/6-understand-lineage.png":::
10
+
9
11
Consider a scenario where a sales analytics dashboard suddenly shows incorrect revenue figures. With lineage tracking, you can trace back through the data flow to identify which upstream table or transformation caused the issue. This capability transforms troubleshooting from guesswork into a systematic investigation.
10
12
11
13
Lineage is aggregated across all workspaces attached to a Unity Catalog metastore. When you capture lineage in one workspace, users in other workspaces that share the same metastore can view that lineage information. This **cross-workspace visibility** is particularly valuable for organizations with distributed data teams.
@@ -42,7 +44,7 @@ Beyond table relationships, Catalog Explorer shows which jobs and dashboards con
42
44
43
45
This information supports impact analysis. Before modifying a table schema, you can identify which downstream jobs and dashboards might be affected.
44
46
45
-
## Manage table ownership and permissions
47
+
## Manage table ownership
46
48
47
49
Every securable object in Unity Catalog has an **owner**. The owner has full privileges on the object and can grant permissions to other users. Understanding and managing ownership is essential for governance accountability.
48
50
@@ -54,6 +56,8 @@ To view or change ownership in Catalog Explorer:
54
56
4. Search for and select a user, group, or service principal.
55
57
5. Select **Save**.
56
58
59
+
:::image type="content" source="../media/6-manage-table-ownership.png" alt-text="Screenshot of the set owner dialog." lightbox="../media/6-manage-table-ownership.png":::
60
+
57
61
Only the current owner or a **metastore admin** can transfer ownership. After transfer, the previous owner loses ownership privileges unless explicitly granted.
58
62
59
63
To view lineage, users need at least the `BROWSE` privilege on the parent catalog. Objects that a user doesn't have permission to access appear as **masked nodes** in the lineage graph. This security model ensures that lineage visibility respects your access control policies.
@@ -74,13 +78,13 @@ DESCRIBE HISTORY sales.customers.orders LIMIT 1;
74
78
75
79
The history includes detailed information about each operation:
76
80
77
-
| Column | Description |
78
-
|--------|-------------|
79
-
|`version`| Table version number |
80
-
|`timestamp`| When the operation occurred |
81
-
|`operation`| Type of operation (WRITE, UPDATE, DELETE, MERGE) |
82
-
|`userName`| User who performed the operation |
83
-
|`operationMetrics`| Metrics like rows affected and files modified |
Copy file name to clipboardExpand all lines: learn-pr/wwl-databricks/govern-unity-catalog-objects/includes/7-configure-audit-logging.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -130,6 +130,8 @@ To enable verbose audit logs:
130
130
3. Select the **Advanced** tab.
131
131
4. Locate **Verbose Audit Logs** and enable the feature.
132
132
133
+
:::image type="content" source="../media/7-verbose-audit-logs.png" alt-text="Screenshot of the Workspace admin advanced settings." lightbox="../media/7-verbose-audit-logs.png":::
134
+
133
135
When you enable or disable verbose logging, Azure Databricks logs this configuration change as an auditable event. This creates accountability for who modified the logging configuration and when.
134
136
135
137
> [!IMPORTANT]
@@ -145,6 +147,8 @@ While the audit log system table provides direct access within Databricks, many
145
147
146
148
Platform administrators configure log delivery through the Azure portal. As a data engineer, you should know that logs typically arrive within **15 minutes** of the event occurring. When building monitoring solutions, account for this latency in your alerting thresholds.
147
149
150
+
:::image type="content" source="../media/7-diagnostic-setting.png" alt-text="Screenshot of Azure portal, showing diagnostic setting for Azure Databricks." lightbox="../media/7-diagnostic-setting.png":::
151
+
148
152
The combination of the system table for interactive queries and external delivery for operational monitoring provides comprehensive coverage. You can use the system table for ad-hoc investigations while relying on your SIEM platform for continuous security monitoring.
149
153
150
154
Understanding audit logging capabilities positions you to support compliance requirements and respond quickly to security incidents. With the audit log system table, you have a powerful tool for tracking activity, investigating issues, and demonstrating governance controls to stakeholders.
0 commit comments