Skip to content

Commit b4b011d

Browse files
authored
Merge pull request #52945 from weslbo/images-govern-unity-catalog-objects
Images govern unity catalog objects
2 parents e69cdcb + d13824b commit b4b011d

20 files changed

Lines changed: 52 additions & 28 deletions

learn-pr/wwl-databricks/govern-unity-catalog-objects/includes/1-introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Effective data governance requires more than just controlling access—it demands visibility into your data assets, accountability for how they're used, and enforcement mechanisms that scale with your organization. As data platforms grow in complexity, managing table definitions, tracking data lineage, enforcing retention policies, and sharing data securely become increasingly challenging. **Unity Catalog** in Azure Databricks provides the foundation for comprehensive governance that addresses these challenges through centralized metadata management and policy enforcement.
22

3-
When you govern Unity Catalog objects, you work across several interconnected capabilities. You document tables and columns with **comments and tags** that help data consumers discover and understand your assets. You implement **attribute-based access control (ABAC)** using governed tags and policies that automatically enforce fine-grained permissions. You apply **row filtering and column masking** to protect sensitive data while maintaining query functionality. You configure **data retention policies** using Delta Lake's VACUUM and predictive optimization to manage storage costs and meet compliance requirements.
3+
When you govern Unity Catalog objects, you work across several interconnected capabilities. You document tables and columns with **comments and tags** that help data consumers discover and understand your assets. You implement **attribute-based access control (ABAC)** using governed tags and policies that automatically enforce fine-grained permissions. You apply **row filtering and column masking** to protect sensitive data while maintaining query functionality. You configure **data retention policies** using Delta Lake's `VACUUM` and predictive optimization to manage storage costs and meet compliance requirements.
44

55
Beyond access control, governance extends to **data lineage tracking** that shows how data flows through your pipelines, enabling impact analysis and troubleshooting. **Audit logging** captures who did what and when, supporting security investigations and regulatory compliance. For external collaboration, **Delta Sharing** lets you share data with partners and customers while maintaining governance controls over what they can access.
66

learn-pr/wwl-databricks/govern-unity-catalog-objects/includes/2-preserve-table-column-definitions.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,11 @@ Consider the following example, where we add comments directly to columns as par
1010

1111
```sql
1212
CREATE TABLE sales.customers.profiles (
13-
customer_id BIGINT COMMENT 'Unique identifier for each customer',
14-
email STRING COMMENT 'Customer primary email address',
15-
created_date DATE COMMENT 'Date when customer account was created',
16-
preferences STRUCT<notifications: BOOLEAN, language: STRING>
17-
COMMENT 'Customer preference settings'
13+
customer_id BIGINT COMMENT 'Unique identifier for each customer',
14+
email STRING COMMENT 'Customer primary email address',
15+
created_date DATE COMMENT 'Date when customer account was created',
16+
preferences STRUCT<notifications: BOOLEAN, language: STRING>
17+
COMMENT 'Customer preference settings'
1818
);
1919
```
2020

@@ -53,6 +53,8 @@ Unity Catalog can automatically generate comments using AI-powered suggestions.
5353

5454
For column comments, select **AI generate** above the column list to generate suggestions for all columns.
5555

56+
:::image type="content" source="../media/2-use-ai-generated-comments.png" alt-text="Screenshot of AI-generated comments." lightbox="../media/2-use-ai-generated-comments.png":::
57+
5658
> [!IMPORTANT]
5759
> **AI-generated comments** are suggestions based on schema analysis. Always review these comments before saving, as AI models might generate inaccurate descriptions. Don't rely on AI comments for data classification tasks like detecting personally identifiable information (PII).
5860

learn-pr/wwl-databricks/govern-unity-catalog-objects/includes/3-configure-attribute-based-access-control-tags-policies.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,16 @@ In this unit, you learn how to use governed tags and ABAC policies to implement
99

1010
Governed tags differ from the standard tags you've already encountered. While standard tags help with organization and discovery, governed tags add enforcement and consistency at the account level. Administrators define governed tags with specific allowed values, and only authorized users can assign them.
1111

12+
:::image type="content" source="../media/3-understand-governed-tags.png" alt-text="Diagram explaining standard and governed tags." border="false" lightbox="../media/3-understand-governed-tags.png":::
13+
1214
Consider the following differences between tag types:
1315

14-
| Aspect | Standard tags | Governed tags |
15-
|--------|---------------|---------------|
16-
| Scope | Any key-value pair | Predefined keys and allowed values |
17-
| Control | Any user with APPLY TAG privilege | Only users with ASSIGN permission |
18-
| Purpose | Organization and discovery | Policy enforcement and compliance |
19-
| Visual indicator | None | Lock icon in Catalog Explorer |
16+
| Aspect | Standard tags | Governed tags |
17+
| ---------------- | --------------------------------- | ---------------------------------- |
18+
| Scope | Any key-value pair | Predefined keys and allowed values |
19+
| Control | Any user with `APPLY TAG` privilege | Only users with `ASSIGN` permission |
20+
| Purpose | Organization and discovery | Policy enforcement and compliance |
21+
| Visual indicator | None | Lock icon in Catalog Explorer |
2022

2123
Governed tags serve as the foundation for ABAC policies. When you tag a table with `sensitivity=high`, you can then create a policy that masks certain columns for all tables with that tag. This approach scales efficiently because adding the tag to new tables automatically applies the policy.
2224

@@ -26,6 +28,8 @@ To create a governed tag, select **Catalog** > **Governance** > **Governed Tags*
2628

2729
For example, you might create a governed tag with key `pii` and allowed values `ssn`, `email`, and `address`. This ensures consistent classification of personally identifiable information across your organization.
2830

31+
:::image type="content" source="../media/3-create-governed-tag.png" alt-text="Screenshot showing the create governed tag dialog box." lightbox="../media/3-create-governed-tag.png":::
32+
2933
> [!NOTE]
3034
> Tag data is stored as plain text. Don't use tag names or values that contain sensitive information.
3135
@@ -78,6 +82,8 @@ This policy applies the `filter_non_eu` function to any table in the `sales` cat
7882

7983
You can also create policies using Catalog Explorer. Select the catalog or schema, choose the **Policies** tab, and select **New policy**. The visual interface guides you through selecting principals, scope, and conditions.
8084

85+
:::image type="content" source="../media/3-create-policy.png" alt-text="Screenshot for creating policies using Catalog Explorer." lightbox="../media/3-create-policy.png":::
86+
8187
## Create column mask policies
8288

8389
Column mask policies control what values users see in specific columns. Like row filters, these policies rely on UDFs to implement the masking logic.
@@ -140,6 +146,8 @@ Policy quotas limit how many policies you can create:
140146

141147
Only one row filter can apply to any given table, and only one column mask can apply to any given column. If multiple policies would result in multiple filters or masks, Azure Databricks blocks access and throws an error.
142148

149+
:::image type="content" source="../media/3-understand-policy-inheritance-scope.png" alt-text="Diagram explaining policy inheritance and scope." border="false" lightbox="../media/3-understand-policy-inheritance-scope.png":::
150+
143151
> [!IMPORTANT]
144152
> You must use Databricks Runtime 16.4 or above, or serverless compute, to access tables secured by ABAC policies. Users not subject to the policy can use any runtime.
145153

learn-pr/wwl-databricks/govern-unity-catalog-objects/includes/4-implement-row-filtering-column-masking.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,4 +99,6 @@ Both approaches achieve fine-grained security, but they differ in scope:
9999
- **Row and Column Security** is embedded at the table level and guarantees that restrictions apply everywhere the table is used.
100100
- **Dynamic Views** provide flexibility: you can create multiple views with different rules for different audiences, while keeping the base table unrestricted.
101101

102+
:::image type="content" source="../media/4-choose-between-approaches.png" alt-text="Diagram helping you choose between the two approaches." border="false" lightbox="../media/4-choose-between-approaches.png":::
103+
102104
In practice, you use table-level controls when you need strict enforcement, and dynamic views when you want adaptable, shareable abstractions.

learn-pr/wwl-databricks/govern-unity-catalog-objects/includes/5-apply-data-retention-policies.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Every organization faces the challenge of balancing data availability with compliance requirements and storage costs. When you implement **data retention policies** in Azure Databricks, you ensure that data is kept only as long as necessary while remaining accessible for legitimate business needs. This becomes especially critical when regulations require you to delete personal data upon request.
22

3-
In this unit, you learn how to configure retention settings, use **VACUUM** to remove obsolete data, handle deletion requests for compliance, and automate maintenance with **predictive optimization**.
3+
In this unit, you learn how to configure retention settings, use **`VACUUM`** to remove obsolete data, handle deletion requests for compliance, and automate maintenance with **predictive optimization**.
44

55
## Configure Delta Lake retention settings
66

@@ -9,7 +9,9 @@ In this unit, you learn how to configure retention settings, use **VACUUM** to r
99
| Property | Default | Purpose |
1010
|----------|---------|---------|
1111
| `delta.logRetentionDuration` | 30 days | Controls how long transaction log history is kept |
12-
| `delta.deletedFileRetentionDuration` | 7 days | Determines when VACUUM can remove unreferenced data files |
12+
| `delta.deletedFileRetentionDuration` | 7 days | Determines when `VACUUM` can remove unreferenced data files |
13+
14+
:::image type="content" source="../media/5-understand-data-retention-settings.png" alt-text="Diagram explaining Delta Lake retention settings." border="false" lightbox="../media/5-understand-data-retention-settings.png":::
1315

1416
These settings work together to define your **time travel** capabilities. For example, if you need 30 days of historical data access, you must configure both properties accordingly:
1517

@@ -24,7 +26,7 @@ SET TBLPROPERTIES (
2426
> [!IMPORTANT]
2527
> Increasing retention duration increases storage costs because more data files are preserved. Before adjusting these settings, evaluate your compliance requirements against storage budget constraints.
2628
>
27-
> You must set both of these properties to ensure table history is retained for longer duration for tables with frequent VACUUM operations
29+
> You must set both of these properties to ensure table history is retained for longer duration for tables with frequent `VACUUM` operations
2830
2931
To view current retention settings for a table, use the following command:
3032

@@ -44,9 +46,9 @@ VACUUM sales.customers.transactions;
4446
VACUUM sales.customers.transactions RETAIN 168 HOURS;
4547
```
4648

47-
When you run VACUUM, Delta Lake identifies files associated with versions older than the retention threshold and removes them from storage. After this operation, time travel queries to those older versions fail because the underlying data no longer exists.
49+
When you run `VACUUM`, Delta Lake identifies files associated with versions older than the retention threshold and removes them from storage. After this operation, time travel queries to those older versions fail because the underlying data no longer exists.
4850

49-
For tables with **deletion vectors** enabled, you must also run `REORG TABLE ... APPLY (PURGE)` after deleting records to permanently remove the underlying data. VACUUM alone doesn't remove data marked for deletion by deletion vectors.
51+
For tables with **deletion vectors** enabled, you must also run `REORG TABLE ... APPLY (PURGE)` after deleting records to permanently remove the underlying data. `VACUUM` by itself does not remove data that deletion vectors have marked for deletion.
5052

5153
> [!NOTE]
5254
> **Deletion vectors** are a storage optimization feature you can enable on Delta Lake tables. By default, when a single row in a data file is deleted, the entire Parquet file containing the record must be rewritten. With deletion vectors enabled, it marks rows as deleted without rewriting the entire file.
@@ -92,11 +94,11 @@ When you delete data in your bronze layer, you must also remove it from silver a
9294
spark.readStream.option('skipChangeCommits', 'true').table("bronze.users")
9395
```
9496

95-
After running deletion operations, execute VACUUM to permanently remove the data files from storage.
97+
After running deletion operations, execute `VACUUM` to permanently remove the data files from storage.
9698

9799
## Automate retention with predictive optimization
98100

99-
Manually scheduling VACUUM and OPTIMIZE operations across many tables is time-consuming and error-prone. **Predictive optimization** in Unity Catalog automates these maintenance tasks for managed tables.
101+
Manually scheduling `VACUUM` and `OPTIMIZE` operations across many tables is time-consuming and error-prone. **Predictive optimization** in Unity Catalog automates these maintenance tasks for managed tables.
100102

101103
When enabled, predictive optimization automatically:
102104

@@ -111,7 +113,7 @@ ALTER SCHEMA sales.customers ENABLE PREDICTIVE OPTIMIZATION;
111113
```
112114

113115
> [!TIP]
114-
> Before enabling predictive optimization, set your desired `delta.deletedFileRetentionDuration` on tables that require longer retention periods. The default VACUUM retention is 7 days, which might be shorter than your compliance requirements.
116+
> Before enabling predictive optimization, set your desired `delta.deletedFileRetentionDuration` on tables that require longer retention periods. The default `VACUUM` retention is 7 days, which might be shorter than your compliance requirements.
115117
116118
To check whether predictive optimization is enabled for a table:
117119

learn-pr/wwl-databricks/govern-unity-catalog-objects/includes/6-manage-data-lineage-tracking.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ In this unit, you learn how to use Catalog Explorer to view and manage data line
66

77
Unity Catalog automatically captures **runtime data lineage** across queries run on Azure Databricks. This lineage tracking works across all languages and captures relationships down to the **column level**. Lineage data includes the notebooks, jobs, and dashboards that interact with your tables.
88

9+
:::image type="content" source="../media/6-understand-lineage.png" alt-text="Screenshot of Unity Catalog lineage." lightbox="../media/6-understand-lineage.png":::
10+
911
Consider a scenario where a sales analytics dashboard suddenly shows incorrect revenue figures. With lineage tracking, you can trace back through the data flow to identify which upstream table or transformation caused the issue. This capability transforms troubleshooting from guesswork into a systematic investigation.
1012

1113
Lineage is aggregated across all workspaces attached to a Unity Catalog metastore. When you capture lineage in one workspace, users in other workspaces that share the same metastore can view that lineage information. This **cross-workspace visibility** is particularly valuable for organizations with distributed data teams.
@@ -42,7 +44,7 @@ Beyond table relationships, Catalog Explorer shows which jobs and dashboards con
4244

4345
This information supports impact analysis. Before modifying a table schema, you can identify which downstream jobs and dashboards might be affected.
4446

45-
## Manage table ownership and permissions
47+
## Manage table ownership
4648

4749
Every securable object in Unity Catalog has an **owner**. The owner has full privileges on the object and can grant permissions to other users. Understanding and managing ownership is essential for governance accountability.
4850

@@ -54,6 +56,8 @@ To view or change ownership in Catalog Explorer:
5456
4. Search for and select a user, group, or service principal.
5557
5. Select **Save**.
5658

59+
:::image type="content" source="../media/6-manage-table-ownership.png" alt-text="Screenshot of the set owner dialog." lightbox="../media/6-manage-table-ownership.png":::
60+
5761
Only the current owner or a **metastore admin** can transfer ownership. After transfer, the previous owner loses ownership privileges unless explicitly granted.
5862

5963
To view lineage, users need at least the `BROWSE` privilege on the parent catalog. Objects that a user doesn't have permission to access appear as **masked nodes** in the lineage graph. This security model ensures that lineage visibility respects your access control policies.
@@ -74,13 +78,13 @@ DESCRIBE HISTORY sales.customers.orders LIMIT 1;
7478

7579
The history includes detailed information about each operation:
7680

77-
| Column | Description |
78-
|--------|-------------|
79-
| `version` | Table version number |
80-
| `timestamp` | When the operation occurred |
81-
| `operation` | Type of operation (WRITE, UPDATE, DELETE, MERGE) |
82-
| `userName` | User who performed the operation |
83-
| `operationMetrics` | Metrics like rows affected and files modified |
81+
| Column | Description |
82+
| ------------------ | ------------------------------------------------ |
83+
| `version` | Table version number |
84+
| `timestamp` | When the operation occurred |
85+
| `operation` | Type of operation (WRITE, UPDATE, DELETE, MERGE) |
86+
| `userName` | User who performed the operation |
87+
| `operationMetrics` | Metrics like rows affected and files modified |
8488

8589
This history supports auditing and compliance by providing a complete record of who changed what and when.
8690

learn-pr/wwl-databricks/govern-unity-catalog-objects/includes/7-configure-audit-logging.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,8 @@ To enable verbose audit logs:
130130
3. Select the **Advanced** tab.
131131
4. Locate **Verbose Audit Logs** and enable the feature.
132132

133+
:::image type="content" source="../media/7-verbose-audit-logs.png" alt-text="Screenshot of the Workspace admin advanced settings." lightbox="../media/7-verbose-audit-logs.png":::
134+
133135
When you enable or disable verbose logging, Azure Databricks logs this configuration change as an auditable event. This creates accountability for who modified the logging configuration and when.
134136

135137
> [!IMPORTANT]
@@ -145,6 +147,8 @@ While the audit log system table provides direct access within Databricks, many
145147

146148
Platform administrators configure log delivery through the Azure portal. As a data engineer, you should know that logs typically arrive within **15 minutes** of the event occurring. When building monitoring solutions, account for this latency in your alerting thresholds.
147149

150+
:::image type="content" source="../media/7-diagnostic-setting.png" alt-text="Screenshot of Azure portal, showing diagnostic setting for Azure Databricks." lightbox="../media/7-diagnostic-setting.png":::
151+
148152
The combination of the system table for interactive queries and external delivery for operational monitoring provides comprehensive coverage. You can use the system table for ad-hoc investigations while relying on your SIEM platform for continuous security monitoring.
149153

150154
Understanding audit logging capabilities positions you to support compliance requirements and respond quickly to security incidents. With the audit log system table, you have a powerful tool for tracking activity, investigating issues, and demonstrating governance controls to stakeholders.

0 commit comments

Comments
 (0)