Skip to content

Commit f7d59f1

Browse files
Merge pull request #2961 from MicrosoftDocs/main639130576328361253sync_temp
For protected branch, push strategy should use PR and merge to target branch method to work around git push error
2 parents 687dc6c + 129401a commit f7d59f1

17 files changed

Lines changed: 173 additions & 18 deletions
80.9 KB
Loading
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
---
2+
title: Spark support for OneLake security row-level and column-level security
3+
description: Learn how Fabric Spark enforces OneLake security row-level (RLS) and column-level (CLS) policies and prepares filtered data for users in notebooks and Spark jobs.
4+
ms.reviewer: tvilutis
5+
ms.topic: concept-article
6+
ms.custom:
7+
- best-spark-on-azure
8+
ms.date: 04/28/2026
9+
ms.search.form: Spark OneLake Security RLS CLS
10+
ai-usage: ai-assisted
11+
---
12+
13+
# Spark support for OneLake security (RLS and CLS)
14+
15+
Fabric Spark integrates with [OneLake security](../onelake/security/get-started-onelake-security.md) so that row-level security (RLS) and column-level security (CLS) policies defined once in OneLake are consistently enforced when users read lakehouse Delta tables from Spark notebooks and Spark job definitions. Users continue to write standard Spark SQL or DataFrame queries; Spark transparently filters the result so each user sees only the rows and columns they're authorized to access.
16+
17+
This article explains *how* Spark works with OneLake security, including the enforcement architecture, the data preparation flow, the user experience, and the supported scenarios and limits.
18+
19+
> [!NOTE]
20+
> For policy authoring and the cross-engine model, see [Row-level security in OneLake](../onelake/security/row-level-security.md) and [Column-level security in OneLake](../onelake/security/column-level-security.md).
21+
22+
## Concepts at a glance
23+
24+
* **Single source of truth.** RLS rules and CLS column lists are defined once on the lakehouse via OneLake security roles. Spark doesn't store or duplicate the policy.
25+
* **Engine-agnostic effective access.** OneLake returns the precomputed *effective access* for the requesting user, including allowed columns and RLS row-filter metadata. Spark consumes that effective access at query time.
26+
* **Delta-only filtering.** The OneLake and Fabric platform layer applies RLS and CLS only to Delta parquet tables. Non-Delta objects with rules applied are blocked by the platform rather than filtered by Spark.
27+
* **Privileged roles bypass.** As OneLake and Fabric platform behavior, workspace **Admin**, **Member**, and **Contributor** roles aren't restricted by RLS or CLS. Filtering applies to **Viewer** and to users granted access through OneLake security roles.
28+
29+
## How Spark enforces OneLake security
30+
31+
When a user submits a query that touches a secured lakehouse table, Spark prepares an execution plan that combines the user's query with the OneLake security effective access for that user. Enforcement happens *during* execution, not as a post-filter step in user code, so it can't be bypassed by alternate APIs or path-based reads.
32+
33+
### Two-context execution model
34+
35+
Fabric Spark uses two execution contexts to keep policy evaluation isolated from user code:
36+
37+
* **User context.** Runs the user's notebook or Spark job definition with the user's identity. This context plans the query and consumes the filtered output, but it never has direct, unfiltered access to secured tables.
38+
* **System (security) context.** A privileged, Microsoft-managed context that resolves the user's effective access against OneLake, reads the underlying Delta files, applies RLS row filtering and CLS projections, and returns only the rows and columns the user is allowed to see.
39+
40+
The system context shows up in the **Monitoring hub** as `SparkSecurityControl` jobs that run alongside the user's notebook session. The job name and monitoring experience are Fabric platform behavior. These jobs are expected and indicate that OneLake security enforcement is active.
41+
42+
### Query flow for a secured table
43+
44+
1. The user runs a query in a Spark notebook, for example `SELECT * FROM lakehouse.sales`.
45+
1. Spark resolves the table through the lakehouse catalog and detects that OneLake security is enabled.
46+
1. Spark requests the **effective access** for the current user from OneLake. The response includes the allowed column list (CLS) and RLS row-filter metadata.
47+
1. The system security context reads the Delta files, projects only the allowed columns, and applies RLS by using bitmap-style or deletion-vector-style row filtering during execution.
48+
1. The filtered result is handed back to the user context, which completes the rest of the user's query (joins, aggregations, writes to non-secured targets, and so on) over the already-filtered data.
49+
50+
51+
### What happens for each policy type
52+
53+
| Policy | What Spark returns | Notes |
54+
| --- | --- | --- |
55+
| **RLS only** | All columns, but only the rows allowed by the RLS rule. | Row filtering is enforced in the security context by using bitmap-style or deletion-vector-style filtering; users can't observe the filter logic. |
56+
| **CLS only** | Only the allowed columns; all rows. | `SELECT *` succeeds and returns the allowed columns when at least one column is allowed. If no columns are allowed, Spark fails the query. |
57+
| **RLS + CLS in same role** | Allowed rows projected to allowed columns. | Supported as long as both rules belong to the *same* role. |
58+
| **RLS in role A, CLS in role B (same user)** | Query fails. | The OneLake and Fabric platform layer doesn't support a user being a member of two roles where one defines RLS and the other defines CLS. See [Row-level security](../onelake/security/row-level-security.md) and [Column-level security](../onelake/security/column-level-security.md). |
59+
| **Non-Delta object** | Access blocked. | The OneLake and Fabric platform layer applies RLS and CLS only to Delta parquet tables; other objects in a secured role are blocked. |
60+
61+
For the canonical authoring rules and RLS expression syntax, see the [row-level security](../onelake/security/row-level-security.md#define-row-level-security-rules) and [column-level security](../onelake/security/column-level-security.md#define-column-level-security-rules) articles.
62+
63+
## How Spark prepares data for users
64+
65+
OneLake security is designed to be transparent to the data consumer. Users continue to use the APIs they already know, and Spark handles policy resolution and filtering on their behalf.
66+
67+
### Spark SQL
68+
69+
```sql
70+
-- Returns only rows and columns the current user is authorized to see.
71+
SELECT product_category, SUM(amount) AS total
72+
FROM sales.transactions
73+
GROUP BY product_category;
74+
```
75+
76+
### PySpark DataFrame
77+
78+
```python
79+
df = spark.read.table("sales.transactions")
80+
df.filter("region = 'EMEA'").groupBy("product_category").sum("amount").show()
81+
```
82+
83+
In both examples, the `transactions` table data that's loaded into the DataFrame is already filtered by OneLake security. Subsequent transformations operate over the filtered data only.
84+
85+
### Lakehouse explorer tablepreview
86+
87+
The lakehouse explorer preview also honors OneLake security and shows the filtered view of secured tables when previewing data through Spark. Users see only the rows and columns granted to them by their OneLake security role.
88+
89+
### Direct file access is blocked
90+
91+
Direct path access bypasses lakehouse catalog policy resolution. When OneLake security is enabled on a table, the OneLake and Fabric platform layer blocks the following patterns for non-privileged users:
92+
93+
* `spark.read.format("delta").load("abfss://...")`
94+
* `DeltaTable.forPath(spark, "abfss://...")`
95+
* OneLake REST/SDK reads against the `Tables/<table>` folder of a secured table.
96+
97+
Users must access secured tables through the lakehouse table name (for example `spark.read.table("lakehouse.table")` or Spark SQL) so that Spark can resolve and apply the effective access.
98+
99+
## User experience
100+
101+
* **Transparent filtering.** No query rewriting or special syntax is required. The same notebook works for users with different roles and returns role-specific data.
102+
* **Consistent results across engines.** The same RLS rule and CLS projection that's applied in Spark is also applied in the SQL analytics endpoint, semantic models built on Direct Lake, and authorized third-party engines. See [OneLake security integrations overview](../onelake/security/onelake-security-integrations-overview.md).
103+
* **Privileged roles see everything.** As OneLake and Fabric platform behavior, workspace **Admin**, **Member**, and **Contributor** users continue to see unfiltered data, which is useful for pipeline development, table maintenance (`OPTIMIZE`, `VACUUM`), and troubleshooting.
104+
* **Monitoring.** The `SparkSecurityControl` jobs that show up in the Monitoring hub correspond to the system context that performs policy enforcement. The job name and Monitoring hub entry are part of Fabric platform operation.
105+
106+
:::image type="content" source="./media/spark-onelake-security-rls-cls/monitoring-hub-security-control.png" alt-text="Screenshot placeholder: Monitoring hub showing a SparkSecurityControl job alongside the user's notebook session.":::
107+
108+
## Performance considerations
109+
110+
* **RLS row filtering.** RLS is applied close to the Delta scan by using bitmap-style or deletion-vector-style filtering and, where supported, the Native Execution Engine. This design minimizes the rows that materialize in the user context.
111+
* **Column pruning.** CLS column lists are combined with the user's projection. Only the intersection is read from Delta storage.
112+
* **Effective access caching.** Spark caches policy and effective-access metadata per query and cleans it up when query execution stops.
113+
* **Partition and statistics use.** Standard Delta partition pruning and data skipping continue to apply with RLS row filtering, so queries against partitioned tables remain efficient.
114+
115+
## Supported scenarios
116+
117+
* Reading lakehouse Delta tables in Spark notebooks and Spark job definitions through the lakehouse catalog (`<lakehouse>.<table>`).
118+
* Spark SQL and PySpark/Scala DataFrame APIs against secured tables.
119+
* Joins, aggregations, and downstream transformations on secured tables.
120+
* Writes from secured sources to non-secured outputs. Output tables that are written outside the secured lakehouse contain only the already-filtered data the writing user was allowed to read.
121+
* Cross-workspace lakehouse access through shortcuts, where the source lakehouse has OneLake security enabled.
122+
123+
## Limitations
124+
125+
OneLake security RLS and CLS in Spark inherit the [overall OneLake security limitations](../onelake/security/get-started-onelake-security.md). Notable behaviors and limits include:
126+
127+
* The OneLake and Fabric platform layer applies RLS and CLS only to **Delta parquet** tables. Non-Delta objects in a secured role are blocked.
128+
* The OneLake and Fabric platform layer blocks direct path reads (`abfss://`, `DeltaTable.forPath`) against secured tables for non-privileged users.
129+
* The OneLake and Fabric platform layer doesn't support a user being a member of two roles where one defines RLS and the other defines CLS for the affected tables.
130+
* As OneLake and Fabric platform behavior, workspace **Admin**, **Member**, and **Contributor** roles bypass RLS and CLS.
131+
* Writes to non-secured outputs from secured sources are supported and operate on already-filtered data. Writes (INSERT/UPDATE/DELETE/MERGE) to a secured target might be unsupported for users subject to RLS or CLS; use a privileged identity for ETL writes into secured tables.
132+
133+
## Related content
134+
135+
* [Get started with OneLake security](../onelake/security/get-started-onelake-security.md)
136+
* [Row-level security in OneLake](../onelake/security/row-level-security.md)
137+
* [Column-level security in OneLake](../onelake/security/column-level-security.md)
138+
* [OneLake security integrations overview](../onelake/security/onelake-security-integrations-overview.md)
139+
* [Workspace roles for lakehouse](workspace-roles-lakehouse.md)
140+
* [Lakehouse sharing and permission management](lakehouse-sharing.md)
141+
* [Fabric Spark security](spark-best-practices-security.md)

docs/data-engineering/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,8 @@ items:
220220
href: workspace-roles-lakehouse.md
221221
- name: Sharing and permissions management
222222
href: lakehouse-sharing.md
223+
- name: Spark support for OneLake security
224+
href: spark-onelake-security.md
223225
- name: Spark compute
224226
items:
225227
- name: Overview and planning

docs/fundamentals/workspace-monitoring-overview.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
2-
title: Workspace monitoring overview
2+
title: Workspace Monitoring Overview
33
description: Understand what is workspace monitoring in Microsoft Fabric and how it can help you to gain insights into the usage and performance of your workspace.
44
author: SnehaGunda
55
ms.author: sngun
66
ms.topic: overview
7-
ms.date: 12/09/2025
7+
ms.date: 04/27/2026
88
#customer intent: As a workspace admin I want to monitor my workspace to gain insights into the usage and performance of my workspace so that I can optimize my workspace and improve the user experience.
99
---
1010

@@ -14,7 +14,7 @@ Workspace monitoring is a Microsoft Fabric database that collects and organizes
1414

1515
## Monitoring
1616

17-
Workspace monitoring creates an [Eventhouse](../real-time-intelligence/eventhouse.md) database in your workspace that collects and organizes logs and metrics from the Fabric items in the workspace. Workspace contributors can query the database to learn more about the performance of their Fabric items.
17+
Workspace monitoring creates an [Eventhouse](../real-time-intelligence/eventhouse.md) database in your workspace that collects and organizes logs and metrics from the Fabric items in the workspace. To learn how to manage and monitor the Eventhouse created for workspace monitoring, see [Manage and Monitor an Eventhouse](../real-time-intelligence/manage-monitor-eventhouse.md). Workspace contributors can query the database to learn more about the performance of their Fabric items.
1818

1919
* **Security** - Workspace monitoring is a secure read-only database that is accessible only to workspace users with at least a contributor role.
2020

docs/real-time-hub/explore-all-data-streams.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Explore All data streams in Fabric Real-Time hub
33
description: This article shows how to explore All data streams in Fabric Real-Time hub. It provides details on the All data streams page in the Real-Time hub user interface.
44
ms.reviewer: majia
55
ms.topic: how-to
6-
ms.date: 12/11/2025
6+
ms.date: 04/27/2026
77
---
88

99
# Explore All data streams in Fabric Real-Time hub
@@ -67,6 +67,7 @@ Here are the actions available on a KQL table from the **All data streams** page
6767
| Endorse | Endorse parent KQL Database of the KQL table. For more information, see [Endorse data streams](endorse-data-streams.md). |
6868
| Detect anomalies (preview) | Detect anomalies in data stored in the KQL table. Follow steps from [How to set up anomaly detection](../real-time-intelligence/anomaly-detection.md#how-to-set-up-anomaly-detection).|
6969
| Create real-time dashboard (preview) |[Create a Real-Time Dashboard with Copilot](/fabric/fundamentals/copilot-generate-dashboard) based on data in the KQL table. |
70+
| Add to data agent | Add the KQL table as a data source to a [data agent](../data-science/concept-data-agent.md) so that it can be used in downstream workflows and automations. |
7071

7172
:::image type="content" source="./media/get-started-real-time-hub/kql-table-actions.png" alt-text="Screenshot that shows the actions available on a KQL table stream." lightbox="./media/get-started-real-time-hub/kql-table-actions.png":::
7273

docs/real-time-hub/explore-data-tables-copilot.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Interactively Explore KQL Tables with Copilot in the Real-Time Hub
33
description: Explore KQL tables in Real-time Hub with Copilot. This guide shows you how to filter, preview, and visualize streaming data interactively.
44
ms.reviewer: mibar
55
ms.topic: how-to
6-
ms.date: 11/23/2025
6+
ms.date: 04/27/2026
77
---
88

99
# Explore KQL table data with Copilot

docs/real-time-hub/get-started-real-time-hub.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: This article shows how to get started with Fabric Real-Time hub and
55
ms.reviewer: majia
66
ms.topic: quickstart
77
ms.custom: null
8-
ms.date: 02/03/2026
8+
ms.date: 04/27/2026
99
---
1010

1111
# Get started with Fabric Real-Time hub
@@ -116,6 +116,7 @@ Here are the actions available on a KQL table from the **All data streams** page
116116
| Endorse | Endorse parent KQL Database of the KQL table. For more information, see [Endorse data streams](endorse-data-streams.md). |
117117
| Detect anomalies (Preview) | Detect anomalies in data stored in the KQL table. Follow steps from [How to set up anomaly detection](../real-time-intelligence/anomaly-detection.md#how-to-set-up-anomaly-detection).|
118118
| Create real-time dashboard (Preview) |[Create a Real-Time Dashboard with Copilot](/fabric/fundamentals/copilot-generate-dashboard) based on data in the KQL table. |
119+
| Add to data agent | Add the KQL table as a data source to a [data agent](../data-science/concept-data-agent.md) so that it can be used in downstream workflows and automations. |
119120

120121
:::image type="content" source="./media/get-started-real-time-hub/kql-table-actions.png" alt-text="Screenshot that shows the actions available on a KQL table stream." lightbox="./media/get-started-real-time-hub/kql-table-actions.png":::
121122

28.6 KB
Loading
15.9 KB
Loading
8.61 KB
Loading

0 commit comments

Comments
 (0)