You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Microsoft JDBC driver for Microsoft Fabric Data Engineering
12
11
13
12
JDBC (Java Database Connectivity) is a widely adopted standard that enables client applications to connect to and work with data from databases and big data platforms.
14
13
15
-
The Microsoft JDBC Driver for Fabric Data Engineering lets you connect, query, and manage Spark workloads in Microsoft Fabric with the reliability and simplicity of the JDBC standard. Built on Microsoft Fabric's Livy APIs, the driver provides secure and flexible Spark SQL connectivity to your Java applications and BI tools. This integration allows you to submit and execute Spark code directly without needing to create separate Notebook or Spark Job Definition artifacts.
14
+
The Microsoft JDBC Driver for Fabric Data Engineering lets you connect, query, and manage Spark workloads in Microsoft Fabric with the reliability and simplicity of the JDBC standard. Built on Microsoft Fabric's Livy APIs, the driver provides secure and flexible Spark SQL connectivity to your Java applications and BI tools. This integration allows you to submit and execute Spark code directly without needing to create separate Notebook or Spark Job Definition artifacts. The driver is compatible with popular JDBC clients such as DbVisualizer and DBeaver, as well as BI tools that support JDBC connectivity, including Tableau.
16
15
17
16
## Key Features
18
17
19
18
-**JDBC 4.2 Compliant**: Full implementation of JDBC 4.2 specification
20
19
-**Microsoft Entra ID Authentication**: Multiple authentication flows including interactive, client credentials, and certificate-based authentication
21
-
-**Enterprise Connection Pooling**: Built-in connection pooling with health monitoring and automatic recovery
20
+
-**Enterprise Connection Pooling**: Built-in connection pooling with health monitoring, automatic recovery, and HikariCP integration
22
21
-**Spark SQL Native Query Support**: Direct execution of Spark SQL statements without translation
23
22
-**Comprehensive Data Type Support**: Support for all Spark SQL data types including complex types (ARRAY, MAP, STRUCT)
24
23
-**Asynchronous Result Set Prefetching**: Background data loading for improved performance
25
24
-**Circuit Breaker Pattern**: Protection against cascading failures with automatic retry
26
25
-**Auto-Reconnection**: Transparent session recovery on connection failures
26
+
-**Advanced Retry Logic**: Retry with exponential backoff and session recovery for improved resilience
27
27
-**Proxy Support**: HTTP and SOCKS proxy configuration for enterprise environments
28
28
29
29
## Prerequisites
@@ -32,12 +32,12 @@ Before using the Microsoft JDBC Driver for Microsoft Fabric Data Engineering, en
32
32
33
33
-**Java Development Kit (JDK)**: Version 11 or higher (Java 21 recommended)
34
34
-**Microsoft Fabric Access**: Access to a Microsoft Fabric workspace
35
-
-**Azure Entra ID Credentials**: Appropriate credentials for authentication
35
+
-**Microsoft Entra ID credentials**: Appropriate credentials for authentication
36
36
-**Workspace and Lakehouse IDs**: GUID identifiers for your Fabric workspace and lakehouse
37
37
38
38
## Download and Installation
39
39
40
-
Microsoft JDBC Driver for Microsoft Fabric Data Engineering version 1.0.0 is the public preview version and supports Java 11, 17 and 21. We're continually improving Java connectivity support and recommend that you work with the latest version of the Microsoft JDBC driver.
40
+
Microsoft JDBC Driver for Microsoft Fabric Data Engineering version 1.0.0 supports Java 11, 17, and 21. We're continually improving Java connectivity support and recommend that you work with the latest version of the Microsoft JDBC driver.
41
41
42
42
*[Download Microsoft JDBC Driver for Microsoft Fabric Data Engineering (zip)](https://download.microsoft.com/download/5e763393-274e-48c5-a55a-0375340bc520/ms-sparksql-jdbc-1.0.0.zip)
43
43
*[Download Microsoft JDBC Driver for Microsoft Fabric Data Engineering (tar)](https://download.microsoft.com/download/5e763393-274e-48c5-a55a-0375340bc520/ms-sparksql-jdbc-1.0.0.tar)
Copy file name to clipboardExpand all lines: docs/data-engineering/spark-odbc-driver.md
+56-13Lines changed: 56 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,7 @@ author: ms-arali
5
5
ms.reviewer: arali
6
6
ms.topic: how-to
7
7
ms.date: 03/18/2026
8
+
ai-usage: ai-assisted
8
9
---
9
10
10
11
# Microsoft ODBC driver for Microsoft Fabric Data Engineering (Preview)
@@ -17,15 +18,18 @@ The Microsoft ODBC Driver for Fabric Data Engineering lets you connect, query, a
17
18
18
19
## Key features
19
20
20
-
-**ODBC 3.x Compliant**: Full implementation of ODBC 3.x specification
21
-
-**Microsoft Entra ID Authentication**: Multiple authentication flows including Azure CLI, interactive, client credentials, certificate-based, and access token authentication
22
-
-**Spark SQL Query Support**: Direct execution of Spark SQL statements
23
-
-**Comprehensive Data Type Support**: Support for all Spark SQL data types including complex types (ARRAY, MAP, STRUCT)
24
-
-**Session Reuse**: Built-in session management for improved performance
25
-
-**Large Table Support**: Optimized handling for large result sets with configurable page sizes
26
-
-**Async Prefetch**: Background data loading for improved performance
27
-
-**Proxy Support**: HTTP proxy configuration for enterprise environments
28
-
-**Multi-Schema Lakehouse Support**: Connect to specific schema within a Lakehouse
21
+
-**ODBC 3.x compliant**: Full implementation of ODBC 3.x specification
22
+
-**Microsoft Entra ID authentication**: Multiple authentication flows including Azure CLI, interactive, client credentials, certificate-based, and access token authentication
23
+
-**Spark SQL query support**: Direct execution of Spark SQL statements
24
+
-**Comprehensive data type support**: Support for all Spark SQL data types including complex types (ARRAY, MAP, STRUCT)
25
+
-**Session reuse**: Built-in session management for improved performance
26
+
-**Large table support**: Optimized handling for large result sets with configurable page sizes
27
+
-**Async prefetch**: Background data loading for improved performance
28
+
-**Proxy support**: HTTP proxy configuration for enterprise environments
29
+
-**Multi-schema Lakehouse support**: Connect to specific schema within a Lakehouse
30
+
-**OneLake integration**: Access Lakehouse data stored in Microsoft OneLake, including tables across multiple schemas, through a unified ODBC interface without separate storage configuration
31
+
-**Environment items support**: Attach Fabric environment items during job execution to apply workspace libraries, Spark properties, and variables to each session
32
+
-**Custom Spark configuration**: Pass Spark configuration properties directly through the connection string to tune session behavior
29
33
30
34
> [!NOTE]
31
35
> In open-source Apache Spark, database and schema are used synonymously. For example, running `SHOW SCHEMAS` or `SHOW DATABASES` in a Fabric Notebook returns the same result — a list of all schemas in the Lakehouse.
@@ -36,7 +40,7 @@ Before using the Microsoft ODBC Driver for Microsoft Fabric Data Engineering, en
36
40
37
41
-**Operating System**: Windows 10/11 or Windows Server 2016+
38
42
-**Microsoft Fabric Access**: Access to a Microsoft Fabric workspace
39
-
-**Azure Entra ID Credentials**: Appropriate credentials for authentication
43
+
-**Microsoft Entra ID credentials**: Appropriate credentials for authentication
40
44
-**Workspace and Lakehouse IDs**: GUID identifiers for your Fabric workspace and lakehouse
41
45
-**Azure CLI** (optional): Required for Azure CLI authentication method
42
46
@@ -332,6 +336,44 @@ These parameters must be present in every connection string:
You can attach a Fabric environment item to the Spark session started by the driver. The selected environment's libraries, Spark properties, and variables are automatically applied when the session is created.
342
+
343
+
| Parameter | Type | Default | Description |
344
+
|-----------|------|---------|-------------|
345
+
| EnvironmentId | UUID | None | Fabric environment item identifier (GUID) to apply during Spark session creation |
346
+
347
+
**Example connection string with an environment item:**
348
+
349
+
```
350
+
DRIVER={Microsoft ODBC Driver for Microsoft Fabric Data Engineering};WorkspaceId=<workspace-id>;LakehouseId=<lakehouse-id>;AuthFlow=AZURE_CLI;EnvironmentId=<environment-id>
351
+
```
352
+
353
+
> [!NOTE]
354
+
> The environment is applied when the Spark session starts. If you also specify custom Spark configuration properties, session-level properties take precedence over the environment defaults.
355
+
356
+
#### Custom Spark configuration
357
+
358
+
You can pass Spark configuration properties directly in the connection string. Any parameter prefixed with `spark.` is automatically applied to the Spark session at creation time, allowing you to override workspace or runtime defaults.
359
+
360
+
**Example Spark configurations:**
361
+
362
+
```
363
+
spark.sql.shuffle.partitions=200
364
+
spark.sql.adaptive.enabled=true
365
+
spark.sql.autoBroadcastJoinThreshold=10485760
366
+
```
367
+
368
+
**Example connection string with custom Spark properties:**
369
+
370
+
```
371
+
DRIVER={Microsoft ODBC Driver for Microsoft Fabric Data Engineering};WorkspaceId=<workspace-id>;LakehouseId=<lakehouse-id>;AuthFlow=AZURE_CLI;spark.sql.shuffle.partitions=200;spark.sql.adaptive.enabled=true
372
+
```
373
+
374
+
> [!NOTE]
375
+
> Spark configuration properties are applied when the session is created. They apply to all queries run within that session and override environment or runtime defaults for the same properties.
376
+
335
377
## DSN configuration
336
378
337
379
### Create a system DSN
@@ -341,21 +383,22 @@ These parameters must be present in every connection string:
341
383
%SystemRoot%\System32\odbcad32.exe
342
384
```
343
385
344
-
2.**Create New System DSN**
386
+
1.**Create New System DSN**
345
387
- Go to "System DSN" tab
346
388
- Select "Add"
347
389
- Select "Microsoft ODBC Driver for Microsoft Fabric Data Engineering"
348
390
- Select "Finish"
349
391
350
-
3.**Configure DSN Settings**
392
+
1.**Configure DSN Settings**
351
393
-**Data Source Name**: Enter a unique name (e.g., `FabricODBC`)
352
394
-**Description**: Optional description
353
395
-**Workspace ID**: Your Fabric workspace GUID
354
396
-**Lakehouse ID**: Your Fabric lakehouse GUID
355
397
-**Authentication**: Select authentication method
398
+
-**Environment ID** (optional): Enter the GUID of the Fabric environment item to attach during session creation
Copy file name to clipboardExpand all lines: docs/data-engineering/tutorial-lakehouse-introduction.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,6 +85,8 @@ The following image shows the source, destination, and data transformation:
85
85
86
86
***Consume**: Power BI can consume data from the lakehouse for reporting and visualization. Each lakehouse has a built-in TDS endpoint called the *SQL analytics endpoint* for easy connectivity and querying of data in the lakehouse tables from other reporting tools. You can also use Direct Lake over OneLake to let Power BI query lakehouse tables directly without import or a dedicated semantic model refresh cycle. Additionally, you can make your data available to non-Microsoft reporting tools by using the TDS/SQL analytics endpoint to connect and run SQL queries for analytics.
87
87
88
+
For Spark SQL workloads specifically, ODBC-compatible clients can connect using the [Microsoft ODBC Driver for Microsoft Fabric Data Engineering (Preview)](./spark-odbc-driver.md) with Microsoft Entra ID authentication (interactive, Azure CLI, service principal, certificate, or access token).
In today's data-driven world, maintaining up-to-date and accurate data models is crucial for informed business decisions. As data evolves, it's essential to refresh these models regularly to ensure that reports and dashboards reflect the most current information. Manual refreshes can be time-consuming and prone to errors, which is where Apache Airflow's orchestration, scheduling, and monitoring capabilities come into play. By leveraging Airflow, organizations can automate the refresh process of Power BI semantic models, ensuring timely and accurate data updates with minimal manual intervention.
15
-
16
-
This article talks about the integration of Apache Airflow with Power BI to automate semantic model refreshes using Data Workflows. It provides a step-by-step guide to setting up the environment, configuring connections, and creating workflows to seamlessly update Power BI semantic models.
14
+
This tutorial shows how to automate Power BI semantic model refreshes using Apache Airflow in Data Factory in Microsoft Fabric. You configure a connection, create a DAG (Directed Acyclic Graph), and schedule automatic refreshes so your reports and dashboards always reflect current data.
17
15
18
16
## Prerequisites
19
17
@@ -42,19 +40,19 @@ To get started, you must complete the following prerequisites:
42
40
43
41
:::image type="content" source="media/apache-airflow-jobs/configure-airflow-environment.png" lightbox="media/apache-airflow-jobs/configure-airflow-environment.png" alt-text="Screenshot to Add Airflow requirement.":::
44
42
45
-
## Create an Apache Airflow connection to connect with Power BI workspace
43
+
## Create an Apache Airflow connection to Power BI
46
44
47
-
1. Select on the "View Airflow connections" to see a list of all the connections are configured.
45
+
1. Select **View Airflow connections** to see all configured connections.
48
46
49
47
:::image type="content" source="media/apache-airflow-jobs/view-apache-airflow-connection.png" lightbox="media/apache-airflow-jobs/view-apache-airflow-connection.png" alt-text="Screenshot to view Apache Airflow connection.":::
50
48
51
49
2. Add the new connection. You may use `Generic` connection type. Store the following fields:
52
50
53
-
-<strong>Connection ID:</strong> The Connection ID.
54
-
-<strong>Connection Type:</strong>Generic
55
-
-<strong>Login:</strong>The Client ID of your service principal.
56
-
-<strong>Password:</strong>The Client secret of your service principal.
57
-
-<strong>Extra:</strong>{"tenantId": The Tenant ID of your service principal.}
51
+
-**Connection ID**: The Connection ID.
52
+
-**Connection Type**: Generic
53
+
-**Login**: The Client ID of your service principal.
54
+
-**Password**: The Client secret of your service principal.
Copy file name to clipboardExpand all lines: docs/data-factory/format-avro.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,14 +3,14 @@ title: How to configure Avro format in the pipeline of Data Factory in Microsoft
3
3
description: This article explains how to configure Avro format in the pipeline of Data Factory in Microsoft Fabric.
4
4
ms.reviewer: jianleishen
5
5
ms.topic: how-to
6
-
ms.date: 06/25/2024
6
+
ms.date: 04/24/2026
7
7
ms.custom:
8
8
- template-how-to
9
9
---
10
10
11
11
# Avro format in Data Factory in [!INCLUDE [product-name](../includes/product-name.md)]
12
12
13
-
This article outlines how to configure Avro format in the pipeline of Data Factory in [!INCLUDE [product-name](../includes/product-name.md)].
13
+
Avro is a row-based data serialization format commonly used in Apache Hadoop workloads. This article outlines how to configure Avro format in a copy activity pipeline in Data Factory in [!INCLUDE [product-name](../includes/product-name.md)].
14
14
15
15
## Supported capabilities
16
16
@@ -67,13 +67,13 @@ Under **Advanced** settings in the **Destination** tab, the following Avro forma
67
67
-**Max rows per file**: When writing data into a folder, you can choose to write to multiple files and specify the maximum rows per file.
68
68
-**File name prefix**: Applicable when **Max rows per file** is configured. Specify the file name prefix when writing data to multiple files, resulted in this pattern: `<fileNamePrefix>_00000.<fileExtension>`. If not specified, the file name prefix is auto generated. This property doesn't apply when the source is a file based store or a partition option enabled data store.
69
69
70
-
## Table summary
70
+
## Avro copy activity properties
71
71
72
72
### Avro as source
73
73
74
74
The following properties are supported in the copy activity **Source** section when using the Avro format.
Copy file name to clipboardExpand all lines: docs/data-factory/format-binary.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,14 +3,14 @@ title: How to configure Binary format in the pipeline of Data Factory in Microso
3
3
description: This article explains how to configure Binary format in the pipeline of Data Factory in Microsoft Fabric.
4
4
ms.reviewer: jianleishen
5
5
ms.topic: how-to
6
-
ms.date: 06/25/2024
6
+
ms.date: 04/24/2026
7
7
ms.custom:
8
8
- template-how-to
9
9
---
10
10
11
-
# Binary format for Data Factory in Microsoft Fabric
11
+
# Binary format in Data Factory in [!INCLUDE [product-name](../includes/product-name.md)]
12
12
13
-
This article outlines how to configure Binary format in Data Factory.
13
+
Binary format copies files as-is without parsing, which is useful for moving files between storage locations without transformation. This article outlines how to configure Binary format in a copy activity pipeline in Data Factory in [!INCLUDE [product-name](../includes/product-name.md)].
14
14
15
15
## Supported capabilities
16
16
@@ -87,7 +87,7 @@ You can choose from the **None**, **bzip2**, **gzip**, **deflate**, **ZipDeflate
87
87
-**Fastest**: The compression operation should complete as quickly as possible, even if the resulting file isn't optimally compressed.
88
88
-**Optimal**: The compression operation should be optimally compressed, even if the operation takes a longer time to complete. For more information, go to the [Compression Level](/dotnet/api/system.io.compression.compressionlevel) article.
0 commit comments