You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> This feature is currently not available in South Central US region.
35
+
33
36
## Get started with materialized lake views
34
37
35
38
To create your first materialized lake view in Microsoft Fabric, see [Get started with materialized lake views](get-started-with-materialized-lake-views.md). For a complete walkthrough that builds a medallion architecture, see [Tutorial: Build a medallion architecture with materialized lake views](tutorial.md).
Copy file name to clipboardExpand all lines: docs/data-factory/dataflow-gen2-partitioned-compute.md
+18-16Lines changed: 18 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ title: Use partitioned compute in Dataflow Gen2 (Preview)
3
3
description: Overview on how to use partitioned compute for parallel processing in Dataflow Gen2 with CI/CD.
4
4
ms.reviewer: miescobar
5
5
ms.topic: how-to
6
-
ms.date: 01/28/2026
6
+
ms.date: 04/13/2026
7
7
ms.custom: dataflows
8
8
---
9
9
@@ -12,41 +12,41 @@ ms.custom: dataflows
12
12
> [!NOTE]
13
13
> Partitioned compute is currently in preview and only available in Dataflow Gen2 with CI/CD.
14
14
15
-
Partitioned compute is a capability of the Dataflow Gen2 engine that allows parts of your dataflow logic to run in parallel, reducing the time to complete its evaluations.
15
+
Partitioned compute is a capability of the Dataflow Gen2 engine that lets parts of your dataflow logic to run in parallel, reducing the time to finish its evaluations.
16
16
17
17
Partitioned compute targets scenarios where the Dataflow engine can efficiently fold operations that can partition the data source and process each partition in parallel. For example, in a scenario where you're connecting to multiple files stored in an Azure Data Lake Storage Gen2, you can partition the list of files from your source, efficiently retrieve the partitioned list of files using [query folding](/power-query/query-folding-basics), use the [combine files experience](/power-query/combine-files-overview), and process all files in parallel.
18
18
19
19
> [!NOTE]
20
-
> Only connectors for Azure Data Lake Storage Gen2, Fabric Lakehouse, Folder, and Azure Blob Storage emit the correct script to use partitioned compute. The connector for SharePoint doesn't support it today.
20
+
> Only connectors for Azure Data Lake Storage Gen2, Folder, and Azure Blob Storage emit the correct script to use partitioned compute. The connectors for SharePoint and Fabric Lakehouse do not support it today.
-[Query with partition keys](#query-with-partition-key)
29
29
30
30
### Enable Dataflow settings
31
31
32
-
Inside the Home tab of the ribbon, select the **Options** button to display its dialog. Navigate to the Scale section and enable the setting that reads **Allow use of partitioned compute**.
32
+
Inside the Home tab of the ribbon, select the **Options** button to show its dialog. Go to the Scale section and turn on the setting that reads **Allow use of partitioned compute**.
33
33
34
-
:::image type="content" source="media/dataflow-gen2-partitioned-compute/partitioned-compute-setting.png" alt-text="Screenshot of the partitioned compute setting inside the scale section of the options dialog.":::
34
+
:::image type="content" source="media/dataflow-gen2-partitioned-compute/partitioned-compute-setting.png" alt-text="Screenshot of the partitioned compute setting inside the Scale section of the Options dialog.":::
35
35
36
36
Enabling this option has two purposes:
37
37
38
-
-Allows your Dataflow to use partitioned compute if discovered through your query scripts
38
+
-Lets your Dataflow use partitioned compute if discovered through your query scripts
39
39
40
40
- Experiences like the combine files will now automatically create partition keys that can be used for partitioned computed
41
41
42
-
You also need to enable the setting in the **Privacy** section to **Allow combining data from multiple sources**.
42
+
You also need to turn on the setting in the **Privacy** section to **Allow combining data from multiple sources**.
43
43
44
44
### Query with partition key
45
45
46
46
> [!NOTE]
47
47
> To use partitioned compute, make sure that your query is set to be staged.
48
48
49
-
After enabling the setting, you can use the combine files experience for a data source that uses the file system view such as Azure Data Lake Storage Gen2. When the combine files experience finalizes, you notice that your query has an **Added custom** step, which has a script similar to this:
49
+
After turning on the setting, you can use the combine files experience for a data source that uses the file system view like Azure Data Lake Storage Gen2. When the combine files experience finalizes, you notice that your query has an **Added custom** step, which has a script similar to this:
50
50
51
51
```M code
52
52
let
@@ -61,21 +61,23 @@ in
61
61
62
62
This script, and specifically the `withPartitionKey` component, drives the logic on how your Dataflow tries to partition your data and how it tries to evaluate things in parallel.
63
63
64
-
You can use the [Table.PartitionKey](/powerquery-m/table-partitionkey) function against the **Added custom** step. This function returns the partition key of the specified table. For the case above, it's the column *RelativePath*. You can get a distinct list of the values in that column to understand all the partitions that will be used during the dataflow run.
64
+
You can use the [Table.PartitionKey](/powerquery-m/table-partitionkey) function against the **Added custom** step. This function returns the partition key of the specified table. For the case above, it's the column *RelativePath*. You can get a distinct list of the values in that column to learn all the partitions that are used during the dataflow run.
65
65
66
66
> [!IMPORTANT]
67
67
> It's important that the partition key column remains in the query in order for partitioned compute to be applied.
68
68
69
69
## Considerations and recommendations
70
70
71
-
-For scenarios where your data source doesn't support folding the transformations for your files, it's recommended that you choose partitioned compute over fast copy.
71
+
-**Partitioned compute vs. fast copy**: If your data source doesn't support folding the transformations for your files, we recommend that you choose partitioned compute over fast copy.
72
72
73
-
-For best performance, use this method to load data directly to staging as your destination or to a Fabric Warehouse.
73
+
-**Lakehouse file access**: To connect to files in the Lakehouse, we recommend using the Azure Data Lake Storage Gen2 connector by passing the URL of the `Files` node.
74
74
75
-
-Only the latest partition run is stored in the Dataflow Staging Lakehouse and retuned by the Dataflow Connector. Consider using a data destination to retain data for each separate partitioned.
75
+
-**Best performance**: Use this method to load data directly to staging as your destination or to a Fabric Warehouse.
76
76
77
-
-Use the *Sample transform file* from the **Combine files** experience to introduce transformations that should happen in every file.
77
+
-**Data retention**: Only the latest partition run is stored in the Dataflow Staging Lakehouse and returned by the Dataflow Connector. Consider using a data destination to retain data for each separate partition.
78
78
79
-
-Partitioned compute only supports a subset of transformations. The performance might vary depending on your source and set of transformations used.
79
+
-**File transformations**: Use the *Sample transform file* from the **Combine files** experience to introduce transformations that should happen in every file.
80
80
81
-
- Billing for the dataflow run is based on capacity unit (CU) consumption.
81
+
-**Supported transformations**: Partitioned compute only supports a subset of transformations. The performance might vary depending on your source and set of transformations used.
82
+
83
+
-**Billing**: Billing for the dataflow run is based on capacity unit (CU) consumption.
0 commit comments