Skip to content

Commit 70ae9b1

Browse files
Learn Build Service GitHub AppLearn Build Service GitHub App
authored andcommitted
Merging changes synced from https://github.com/MicrosoftDocs/fabric-docs-pr (branch live)
2 parents 7d57e0c + 5240f0b commit 70ae9b1

21 files changed

Lines changed: 236 additions & 87 deletions

docs/data-engineering/spark-jdbc-driver.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,26 +4,26 @@ description: Learn how to connect, query, and manage Spark workloads in Microsof
44
ms.reviewer: arali
55
ms.topic: how-to
66
ms.date: 12/05/2025
7+
ai-usage: ai-assisted
78
---
89

9-
# Microsoft JDBC driver for Microsoft Fabric Data Engineering (preview)
10-
11-
[!INCLUDE [feature-preview](../includes/feature-preview-note.md)]
10+
# Microsoft JDBC driver for Microsoft Fabric Data Engineering
1211

1312
JDBC (Java Database Connectivity) is a widely adopted standard that enables client applications to connect to and work with data from databases and big data platforms.
1413

15-
The Microsoft JDBC Driver for Fabric Data Engineering lets you connect, query, and manage Spark workloads in Microsoft Fabric with the reliability and simplicity of the JDBC standard. Built on Microsoft Fabric's Livy APIs, the driver provides secure and flexible Spark SQL connectivity to your Java applications and BI tools. This integration allows you to submit and execute Spark code directly without needing to create separate Notebook or Spark Job Definition artifacts.
14+
The Microsoft JDBC Driver for Fabric Data Engineering lets you connect, query, and manage Spark workloads in Microsoft Fabric with the reliability and simplicity of the JDBC standard. Built on Microsoft Fabric's Livy APIs, the driver provides secure and flexible Spark SQL connectivity to your Java applications and BI tools. This integration allows you to submit and execute Spark code directly without needing to create separate Notebook or Spark Job Definition artifacts. The driver is compatible with popular JDBC clients such as DbVisualizer and DBeaver, as well as BI tools that support JDBC connectivity, including Tableau.
1615

1716
## Key Features
1817

1918
- **JDBC 4.2 Compliant**: Full implementation of JDBC 4.2 specification
2019
- **Microsoft Entra ID Authentication**: Multiple authentication flows including interactive, client credentials, and certificate-based authentication
21-
- **Enterprise Connection Pooling**: Built-in connection pooling with health monitoring and automatic recovery
20+
- **Enterprise Connection Pooling**: Built-in connection pooling with health monitoring, automatic recovery, and HikariCP integration
2221
- **Spark SQL Native Query Support**: Direct execution of Spark SQL statements without translation
2322
- **Comprehensive Data Type Support**: Support for all Spark SQL data types including complex types (ARRAY, MAP, STRUCT)
2423
- **Asynchronous Result Set Prefetching**: Background data loading for improved performance
2524
- **Circuit Breaker Pattern**: Protection against cascading failures with automatic retry
2625
- **Auto-Reconnection**: Transparent session recovery on connection failures
26+
- **Advanced Retry Logic**: Retry with exponential backoff and session recovery for improved resilience
2727
- **Proxy Support**: HTTP and SOCKS proxy configuration for enterprise environments
2828

2929
## Prerequisites
@@ -32,12 +32,12 @@ Before using the Microsoft JDBC Driver for Microsoft Fabric Data Engineering, en
3232

3333
- **Java Development Kit (JDK)**: Version 11 or higher (Java 21 recommended)
3434
- **Microsoft Fabric Access**: Access to a Microsoft Fabric workspace
35-
- **Azure Entra ID Credentials**: Appropriate credentials for authentication
35+
- **Microsoft Entra ID credentials**: Appropriate credentials for authentication
3636
- **Workspace and Lakehouse IDs**: GUID identifiers for your Fabric workspace and lakehouse
3737

3838
## Download and Installation
3939

40-
Microsoft JDBC Driver for Microsoft Fabric Data Engineering version 1.0.0 is the public preview version and supports Java 11, 17 and 21. We're continually improving Java connectivity support and recommend that you work with the latest version of the Microsoft JDBC driver.
40+
Microsoft JDBC Driver for Microsoft Fabric Data Engineering version 1.0.0 supports Java 11, 17, and 21. We're continually improving Java connectivity support and recommend that you work with the latest version of the Microsoft JDBC driver.
4141

4242
* [Download Microsoft JDBC Driver for Microsoft Fabric Data Engineering (zip)](https://download.microsoft.com/download/5e763393-274e-48c5-a55a-0375340bc520/ms-sparksql-jdbc-1.0.0.zip)
4343
* [Download Microsoft JDBC Driver for Microsoft Fabric Data Engineering (tar)](https://download.microsoft.com/download/5e763393-274e-48c5-a55a-0375340bc520/ms-sparksql-jdbc-1.0.0.tar)
@@ -152,7 +152,7 @@ Connection conn = DriverManager.getConnection(url);
152152

153153
**Parameters:**
154154
- `AuthFlow=1`: Specifies interactive browser authentication
155-
- `AuthTenantID` (optional): Azure tenant ID
155+
- `AuthTenantID` (optional): Microsoft Entra tenant ID
156156
- `AuthClientID` (optional): Application (client) ID
157157

158158
**Behavior:**
@@ -181,7 +181,7 @@ Connection conn = DriverManager.getConnection(url);
181181
- `AuthFlow=3`: Specifies client credentials authentication
182182
- `AuthClientID`: Application (client) ID from Microsoft Entra ID
183183
- `AuthClientSecret`: Client secret from Microsoft Entra ID
184-
- `AuthTenantID`: Azure tenant ID
184+
- `AuthTenantID`: Microsoft Entra tenant ID
185185

186186
**Best Practices:**
187187
- Store secrets securely (Azure Key Vault, environment variables)
@@ -211,7 +211,7 @@ Connection conn = DriverManager.getConnection(url);
211211
- `AuthClientID`: Application (client) ID
212212
- `AuthCertificatePath`: Path to PFX/PKCS12 certificate file
213213
- `AuthCertificatePassword`: Certificate password
214-
- `AuthTenantID`: Azure tenant ID
214+
- `AuthTenantID`: Microsoft Entra tenant ID
215215

216216
### Access Token Authentication
217217

docs/data-engineering/spark-odbc-driver.md

Lines changed: 56 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ author: ms-arali
55
ms.reviewer: arali
66
ms.topic: how-to
77
ms.date: 03/18/2026
8+
ai-usage: ai-assisted
89
---
910

1011
# Microsoft ODBC driver for Microsoft Fabric Data Engineering (Preview)
@@ -17,15 +18,18 @@ The Microsoft ODBC Driver for Fabric Data Engineering lets you connect, query, a
1718

1819
## Key features
1920

20-
- **ODBC 3.x Compliant**: Full implementation of ODBC 3.x specification
21-
- **Microsoft Entra ID Authentication**: Multiple authentication flows including Azure CLI, interactive, client credentials, certificate-based, and access token authentication
22-
- **Spark SQL Query Support**: Direct execution of Spark SQL statements
23-
- **Comprehensive Data Type Support**: Support for all Spark SQL data types including complex types (ARRAY, MAP, STRUCT)
24-
- **Session Reuse**: Built-in session management for improved performance
25-
- **Large Table Support**: Optimized handling for large result sets with configurable page sizes
26-
- **Async Prefetch**: Background data loading for improved performance
27-
- **Proxy Support**: HTTP proxy configuration for enterprise environments
28-
- **Multi-Schema Lakehouse Support**: Connect to specific schema within a Lakehouse
21+
- **ODBC 3.x compliant**: Full implementation of ODBC 3.x specification
22+
- **Microsoft Entra ID authentication**: Multiple authentication flows including Azure CLI, interactive, client credentials, certificate-based, and access token authentication
23+
- **Spark SQL query support**: Direct execution of Spark SQL statements
24+
- **Comprehensive data type support**: Support for all Spark SQL data types including complex types (ARRAY, MAP, STRUCT)
25+
- **Session reuse**: Built-in session management for improved performance
26+
- **Large table support**: Optimized handling for large result sets with configurable page sizes
27+
- **Async prefetch**: Background data loading for improved performance
28+
- **Proxy support**: HTTP proxy configuration for enterprise environments
29+
- **Multi-schema Lakehouse support**: Connect to specific schema within a Lakehouse
30+
- **OneLake integration**: Access Lakehouse data stored in Microsoft OneLake, including tables across multiple schemas, through a unified ODBC interface without separate storage configuration
31+
- **Environment items support**: Attach Fabric environment items during job execution to apply workspace libraries, Spark properties, and variables to each session
32+
- **Custom Spark configuration**: Pass Spark configuration properties directly through the connection string to tune session behavior
2933

3034
> [!NOTE]
3135
> In open-source Apache Spark, database and schema are used synonymously. For example, running `SHOW SCHEMAS` or `SHOW DATABASES` in a Fabric Notebook returns the same result — a list of all schemas in the Lakehouse.
@@ -36,7 +40,7 @@ Before using the Microsoft ODBC Driver for Microsoft Fabric Data Engineering, en
3640

3741
- **Operating System**: Windows 10/11 or Windows Server 2016+
3842
- **Microsoft Fabric Access**: Access to a Microsoft Fabric workspace
39-
- **Azure Entra ID Credentials**: Appropriate credentials for authentication
43+
- **Microsoft Entra ID credentials**: Appropriate credentials for authentication
4044
- **Workspace and Lakehouse IDs**: GUID identifiers for your Fabric workspace and lakehouse
4145
- **Azure CLI** (optional): Required for Azure CLI authentication method
4246

@@ -332,6 +336,44 @@ These parameters must be present in every connection string:
332336
| ProxyUsername | String | None | Proxy authentication username |
333337
| ProxyPassword | String | None | Proxy authentication password |
334338

339+
#### Environment settings
340+
341+
You can attach a Fabric environment item to the Spark session started by the driver. The selected environment's libraries, Spark properties, and variables are automatically applied when the session is created.
342+
343+
| Parameter | Type | Default | Description |
344+
|-----------|------|---------|-------------|
345+
| EnvironmentId | UUID | None | Fabric environment item identifier (GUID) to apply during Spark session creation |
346+
347+
**Example connection string with an environment item:**
348+
349+
```
350+
DRIVER={Microsoft ODBC Driver for Microsoft Fabric Data Engineering};WorkspaceId=<workspace-id>;LakehouseId=<lakehouse-id>;AuthFlow=AZURE_CLI;EnvironmentId=<environment-id>
351+
```
352+
353+
> [!NOTE]
354+
> The environment is applied when the Spark session starts. If you also specify custom Spark configuration properties, session-level properties take precedence over the environment defaults.
355+
356+
#### Custom Spark configuration
357+
358+
You can pass Spark configuration properties directly in the connection string. Any parameter prefixed with `spark.` is automatically applied to the Spark session at creation time, allowing you to override workspace or runtime defaults.
359+
360+
**Example Spark configurations:**
361+
362+
```
363+
spark.sql.shuffle.partitions=200
364+
spark.sql.adaptive.enabled=true
365+
spark.sql.autoBroadcastJoinThreshold=10485760
366+
```
367+
368+
**Example connection string with custom Spark properties:**
369+
370+
```
371+
DRIVER={Microsoft ODBC Driver for Microsoft Fabric Data Engineering};WorkspaceId=<workspace-id>;LakehouseId=<lakehouse-id>;AuthFlow=AZURE_CLI;spark.sql.shuffle.partitions=200;spark.sql.adaptive.enabled=true
372+
```
373+
374+
> [!NOTE]
375+
> Spark configuration properties are applied when the session is created. They apply to all queries run within that session and override environment or runtime defaults for the same properties.
376+
335377
## DSN configuration
336378

337379
### Create a system DSN
@@ -341,21 +383,22 @@ These parameters must be present in every connection string:
341383
%SystemRoot%\System32\odbcad32.exe
342384
```
343385

344-
2. **Create New System DSN**
386+
1. **Create New System DSN**
345387
- Go to "System DSN" tab
346388
- Select "Add"
347389
- Select "Microsoft ODBC Driver for Microsoft Fabric Data Engineering"
348390
- Select "Finish"
349391

350-
3. **Configure DSN Settings**
392+
1. **Configure DSN Settings**
351393
- **Data Source Name**: Enter a unique name (e.g., `FabricODBC`)
352394
- **Description**: Optional description
353395
- **Workspace ID**: Your Fabric workspace GUID
354396
- **Lakehouse ID**: Your Fabric lakehouse GUID
355397
- **Authentication**: Select authentication method
398+
- **Environment ID** (optional): Enter the GUID of the Fabric environment item to attach during session creation
356399
- Configure additional settings as needed
357400

358-
4. **Test Connection**
401+
1. **Test Connection**
359402
- Select "Test Connection" to verify settings
360403
- Select "OK" to save
361404

docs/data-engineering/tutorial-lakehouse-introduction.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,8 @@ The following image shows the source, destination, and data transformation:
8585

8686
* **Consume**: Power BI can consume data from the lakehouse for reporting and visualization. Each lakehouse has a built-in TDS endpoint called the *SQL analytics endpoint* for easy connectivity and querying of data in the lakehouse tables from other reporting tools. You can also use Direct Lake over OneLake to let Power BI query lakehouse tables directly without import or a dedicated semantic model refresh cycle. Additionally, you can make your data available to non-Microsoft reporting tools by using the TDS/SQL analytics endpoint to connect and run SQL queries for analytics.
8787

88+
For Spark SQL workloads specifically, ODBC-compatible clients can connect using the [Microsoft ODBC Driver for Microsoft Fabric Data Engineering (Preview)](./spark-odbc-driver.md) with Microsoft Entra ID authentication (interactive, Azure CLI, service principal, certificate, or access token).
89+
8890
## Next step
8991

9092
> [!div class="nextstepaction"]

docs/data-factory/apache-airflow-jobs-power-bi-semantic-model-refresh.md

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,14 @@ description: Learn to refresh Power BI semantic model with Apache Airflow Job.
44
ms.reviewer: abnarain
55
ms.topic: tutorial
66
ms.custom: airflows, sfi-image-nochange
7-
ms.date: 12/18/2024
7+
ms.date: 04/24/2026
88
---
99

1010
# Tutorial: Refresh Power BI Semantic Model with Apache Airflow Job
1111

1212
[!INCLUDE[apache-airflow-note](includes/apache-airflow-note.md)]
1313

14-
In today's data-driven world, maintaining up-to-date and accurate data models is crucial for informed business decisions. As data evolves, it's essential to refresh these models regularly to ensure that reports and dashboards reflect the most current information. Manual refreshes can be time-consuming and prone to errors, which is where Apache Airflow's orchestration, scheduling, and monitoring capabilities come into play. By leveraging Airflow, organizations can automate the refresh process of Power BI semantic models, ensuring timely and accurate data updates with minimal manual intervention.
15-
16-
This article talks about the integration of Apache Airflow with Power BI to automate semantic model refreshes using Data Workflows. It provides a step-by-step guide to setting up the environment, configuring connections, and creating workflows to seamlessly update Power BI semantic models.
14+
This tutorial shows how to automate Power BI semantic model refreshes using Apache Airflow in Data Factory in Microsoft Fabric. You configure a connection, create a DAG (Directed Acyclic Graph), and schedule automatic refreshes so your reports and dashboards always reflect current data.
1715

1816
## Prerequisites
1917

@@ -42,19 +40,19 @@ To get started, you must complete the following prerequisites:
4240

4341
:::image type="content" source="media/apache-airflow-jobs/configure-airflow-environment.png" lightbox="media/apache-airflow-jobs/configure-airflow-environment.png" alt-text="Screenshot to Add Airflow requirement.":::
4442

45-
## Create an Apache Airflow connection to connect with Power BI workspace
43+
## Create an Apache Airflow connection to Power BI
4644

47-
1. Select on the "View Airflow connections" to see a list of all the connections are configured.
45+
1. Select **View Airflow connections** to see all configured connections.
4846

4947
:::image type="content" source="media/apache-airflow-jobs/view-apache-airflow-connection.png" lightbox="media/apache-airflow-jobs/view-apache-airflow-connection.png" alt-text="Screenshot to view Apache Airflow connection.":::
5048

5149
2. Add the new connection. You may use `Generic` connection type. Store the following fields:
5250

53-
- <strong>Connection ID:</strong> The Connection ID.
54-
- <strong>Connection Type:</strong>Generic
55-
- <strong>Login:</strong>The Client ID of your service principal.
56-
- <strong>Password:</strong>The Client secret of your service principal.
57-
- <strong>Extra:</strong>{"tenantId": The Tenant ID of your service principal.}
51+
- **Connection ID**: The Connection ID.
52+
- **Connection Type**: Generic
53+
- **Login**: The Client ID of your service principal.
54+
- **Password**: The Client secret of your service principal.
55+
- **Extra**: `{"tenantId": "<your-tenant-id>"}`
5856

5957
3. Select Save.
6058

docs/data-factory/format-avro.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@ title: How to configure Avro format in the pipeline of Data Factory in Microsoft
33
description: This article explains how to configure Avro format in the pipeline of Data Factory in Microsoft Fabric.
44
ms.reviewer: jianleishen
55
ms.topic: how-to
6-
ms.date: 06/25/2024
6+
ms.date: 04/24/2026
77
ms.custom:
88
- template-how-to
99
---
1010

1111
# Avro format in Data Factory in [!INCLUDE [product-name](../includes/product-name.md)]
1212

13-
This article outlines how to configure Avro format in the pipeline of Data Factory in [!INCLUDE [product-name](../includes/product-name.md)].
13+
Avro is a row-based data serialization format commonly used in Apache Hadoop workloads. This article outlines how to configure Avro format in a copy activity pipeline in Data Factory in [!INCLUDE [product-name](../includes/product-name.md)].
1414

1515
## Supported capabilities
1616

@@ -67,13 +67,13 @@ Under **Advanced** settings in the **Destination** tab, the following Avro forma
6767
- **Max rows per file**: When writing data into a folder, you can choose to write to multiple files and specify the maximum rows per file.
6868
- **File name prefix**: Applicable when **Max rows per file** is configured. Specify the file name prefix when writing data to multiple files, resulted in this pattern: `<fileNamePrefix>_00000.<fileExtension>`. If not specified, the file name prefix is auto generated. This property doesn't apply when the source is a file based store or a partition option enabled data store.
6969

70-
## Table summary
70+
## Avro copy activity properties
7171

7272
### Avro as source
7373

7474
The following properties are supported in the copy activity **Source** section when using the Avro format.
7575

76-
|Name |Description |Value|Required |Avro script property |
76+
|Name |Description |Value|Required |JSON script property |
7777
|:---|:---|:---|:---|:---|
7878
| **File format**|The file format that you want to use.| **Avro**|Yes|type (*under `datasetSettings`*):<br>Avro|
7979
|**Compression type**|The compression codec used to read Avro files.|**None**<br>**deflate**|No|avroCompressionCodec: <br><br>deflate|
@@ -83,7 +83,7 @@ The following properties are supported in the copy activity **Source** section w
8383

8484
The following properties are supported in the copy activity **Destination** section when using the Avro format.
8585

86-
|Name |Description |Value|Required |Avro script property |
86+
|Name |Description |Value|Required |JSON script property |
8787
|:---|:---|:---|:---|:---|
8888
| **File format**|The file format that you want to use.| **Avro**|Yes|type (*under `datasetSettings`*):<br>Avro|
8989
|**Compression type**|The compression codec used to write Avro files.|**None**<br>**deflate**|No|avroCompressionCodec: <br><br>deflate|

docs/data-factory/format-binary.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@ title: How to configure Binary format in the pipeline of Data Factory in Microso
33
description: This article explains how to configure Binary format in the pipeline of Data Factory in Microsoft Fabric.
44
ms.reviewer: jianleishen
55
ms.topic: how-to
6-
ms.date: 06/25/2024
6+
ms.date: 04/24/2026
77
ms.custom:
88
- template-how-to
99
---
1010

11-
# Binary format for Data Factory in Microsoft Fabric
11+
# Binary format in Data Factory in [!INCLUDE [product-name](../includes/product-name.md)]
1212

13-
This article outlines how to configure Binary format in Data Factory.
13+
Binary format copies files as-is without parsing, which is useful for moving files between storage locations without transformation. This article outlines how to configure Binary format in a copy activity pipeline in Data Factory in [!INCLUDE [product-name](../includes/product-name.md)].
1414

1515
## Supported capabilities
1616

@@ -87,7 +87,7 @@ You can choose from the **None**, **bzip2**, **gzip**, **deflate**, **ZipDeflate
8787
- **Fastest**: The compression operation should complete as quickly as possible, even if the resulting file isn't optimally compressed.
8888
- **Optimal**: The compression operation should be optimally compressed, even if the operation takes a longer time to complete. For more information, go to the [Compression Level](/dotnet/api/system.io.compression.compressionlevel) article.
8989

90-
## Table summary
90+
## Binary copy activity properties
9191

9292
### Binary as source
9393

0 commit comments

Comments
 (0)