Skip to content

Commit 2e97d55

Browse files
author
v-mangrandhi
committed
Updated UID
1 parent 932e965 commit 2e97d55

33 files changed

Lines changed: 1222 additions & 0 deletions
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.sql-server-data-virtualization.introduction
3+
title: Introduction
4+
metadata:
5+
title: Introduction
6+
description: "Learn about data virtualization in SQL Server 2025."
7+
ms.date: 02/26/2026
8+
author: HugoMSFT
9+
ms.author: hudequei
10+
ms.topic: unit
11+
ms.custom:
12+
- build-2023
13+
durationInMinutes: 5
14+
content: |
15+
[!include[](includes/1-introduction.md)]
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.sql-server-data-virtualization.polybase
3+
title: Introduction to PolyBase
4+
metadata:
5+
title: Introduction to PolyBase
6+
description: "Learn about PolyBase and the evolution of PolyBase from SQL Server 2016 to SQL Server 2025."
7+
ms.date: 02/26/2026
8+
author: HugoMSFT
9+
ms.author: hudequei
10+
ms.topic: unit
11+
ms.custom:
12+
- build-2023
13+
durationInMinutes: 8
14+
content: |
15+
[!include[](includes/2-polybase.md)]
16+
quiz:
17+
title: Knowledge check
18+
questions:
19+
- content: "Object storage is the recommended solution for high-transactional workloads."
20+
choices:
21+
- content: "True"
22+
isCorrect: false
23+
explanation: "Object storage is recommended for Big Data, Internet of Things (IoT), Analytics, AI, and Machine Learning workloads, not high-transactional or online transactional processing (OLTP) workloads."
24+
- content: "False"
25+
isCorrect: true
26+
explanation: "Object storage architecture was designed for Big Data, Internet of Things (IoT), Analytics, AI, and Machine Learning workloads, not high-transactional workloads."
27+
- content: "To run the PolyBase feature, you need to install and enable the PolyBase Query Service for External Data."
28+
choices:
29+
- content: "True"
30+
isCorrect: true
31+
explanation: "All PolyBase data sources require the **PolyBase Query Service for External Data** to be installed and enabled, even those that don't use the other PolyBase services."
32+
- content: "False"
33+
isCorrect: false
34+
explanation: "Not all PolyBase data sources need to have the PolyBase services configured and running, but they all require the **PolyBase Query Service for External Data** to be installed and enabled."
35+
- content: "SQL Server 2025 supports all types of object storage."
36+
choices:
37+
- content: "True"
38+
isCorrect: false
39+
explanation: "SQL Server 2025 supports S3-compatible object storage only."
40+
- content: "False"
41+
isCorrect: true
42+
explanation: "SQL Server 2025 supports S3-compatible object storage only."
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.sql-server-data-virtualization.polybase-credentials-data-sources
3+
title: PolyBase credentials and data sources
4+
metadata:
5+
title: PolyBase credentials and data sources
6+
description: "This unit goes over PolyBase objects, supported data sources, and PolyBase operations."
7+
ms.date: 02/26/2026
8+
author: HugoMSFT
9+
ms.author: hudequei
10+
ms.topic: unit
11+
ms.custom:
12+
- build-2023
13+
durationInMinutes: 6
14+
content: |
15+
[!include[](includes/3-polybase-credentials-data-sources.md)]
16+
quiz:
17+
title: Knowledge check - Choose the best answer
18+
questions:
19+
- content: "SQL Server 2025 can access external data sources by:"
20+
choices:
21+
- content: "A. Using REST API to S3-compatible object storage providers."
22+
isCorrect: false
23+
explanation: "SQL Server 2025 can also query Delta table files on REST API data sources from a SELECT."
24+
- content: "B. Querying Delta table files."
25+
isCorrect: false
26+
explanation: "SQL Server 2025 can also access S3-compatible object storage providers through REST API."
27+
- content: "C. Accessing Parquet files on Open Database Connectivity (ODBC)-compatible data sources."
28+
isCorrect: false
29+
explanation: "SQL Server 2025 can access Parquet files only on data sources that use the REST API connectors, not generic ODBC."
30+
- content: "D. A and B."
31+
isCorrect: true
32+
explanation: "SQL Server 2025 can access S3-compatible object providers and query Delta table files through REST API."
33+
- content: "PolyBase allows SQL Server to connect to:"
34+
choices:
35+
- content: "A. SQL Server."
36+
isCorrect: false
37+
explanation: "PolyBase also allows SQL Server to connect to Oracle, Azure Blob Storage, Azure Data Lake Storage, and Teradata."
38+
- content: "B. Oracle."
39+
isCorrect: false
40+
explanation: "PolyBase also allows SQL Server to connect to SQL Server, Azure Blob Storage, Azure Data Lake Storage, and Teradata."
41+
- content: "C. Azure Blob Storage."
42+
isCorrect: false
43+
explanation: "PolyBase also allows SQL Server to connect to SQL Server, Oracle, Azure Data Lake Storage, and Teradata."
44+
- content: "D. Azure Data Lake Storage."
45+
isCorrect: false
46+
explanation: "PolyBase also allows SQL Server to connect to SQL Server, Oracle, Azure Blob Storage, and Teradata."
47+
- content: "E. Teradata."
48+
isCorrect: false
49+
explanation: "PolyBase also allows SQL Server to connect to SQL Server, Oracle, Azure Blob Storage, and Azure Data Lake Storage."
50+
- content: "F. A, B, C, D, and E."
51+
isCorrect: true
52+
explanation: "PolyBase allows SQL Server to connect to SQL Server, Oracle, Azure Blob Storage, Azure Data Lake Storage, Teradata, and more."
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.sql-server-data-virtualization.exercise-query-parquet-file
3+
title: Exercise - Use PolyBase to query a Parquet file
4+
metadata:
5+
title: Exercise - Use PolyBase to query a Parquet file
6+
description: "Complete this exercise to learn how to use PolyBase to query an external data source Parquet file and manipulate the data."
7+
ms.date: 02/26/2026
8+
author: HugoMSFT
9+
ms.author: hudequei
10+
ms.topic: unit
11+
ms.custom:
12+
- build-2023
13+
durationInMinutes: 10
14+
content: |
15+
[!include[](includes/4-exercise-query-parquet-file.md)]
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.sql-server-data-virtualization.exercise-connect-azure-sql-database-use-polybase
3+
title: Exercise - Create an external table from a database in Azure SQL Database
4+
metadata:
5+
title: Exercise - Create an external table from a database in Azure SQL Database
6+
description: "An exercise using data virtualization to connect to Azure SQL Database."
7+
ms.date: 02/26/2026
8+
author: HugoMSFT
9+
ms.author: hudequei
10+
ms.topic: unit
11+
ms.custom:
12+
- sfi-ropc-nochange
13+
- build-2023
14+
durationInMinutes: 10
15+
content: |
16+
[!include[](includes/5-exercise-connect-azure-sql-database-use-polybase.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.sql-server-data-virtualization.create-external-table-as-select
3+
title: CREATE EXTERNAL TABLE AS SELECT (CETAS)
4+
metadata:
5+
title: CREATE EXTERNAL TABLE AS SELECT (CETAS)
6+
description: "Learn about the use cases for CETAS, its structure, and how to enable it in SQL Server 2025."
7+
ms.date: 02/26/2026
8+
author: HugoMSFT
9+
ms.author: hudequei
10+
ms.topic: unit
11+
ms.custom:
12+
- build-2023
13+
durationInMinutes: 8
14+
content: |
15+
[!include[](includes/6-create-external-table-as-select.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.sql-server-data-virtualization.exercise-create-external-table-as-select
3+
title: Exercise - CREATE EXTERNAL TABLE AS SELECT
4+
metadata:
5+
title: Exercise - CREATE EXTERNAL TABLE AS SELECT
6+
description: "An exercise on using CREATE EXTERNAL TABLE AS SELECT (CETAS)."
7+
ms.date: 02/26/2026
8+
author: HugoMSFT
9+
ms.author: hudequei
10+
ms.topic: unit
11+
ms.custom:
12+
- build-2023
13+
durationInMinutes: 10
14+
content: |
15+
[!include[](includes/7-exercise-create-external-table-as-select.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.sql-server-data-virtualization.summary
3+
title: Summary
4+
metadata:
5+
title: Summary
6+
description: "Summary of what was learned in this data virtualization and PolyBase module."
7+
ms.date: 02/26/2026
8+
author: HugoMSFT
9+
ms.author: hudequei
10+
ms.topic: unit
11+
ms.custom:
12+
- build-2023
13+
durationInMinutes: 2
14+
content: |
15+
[!include[](includes/8-summary.md)]
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
Data virtualization in SQL Server 2025 is the ability to access data where it lives. Data virtualization integrates data at query time, without replicating or moving the original data.
2+
3+
This training module reviews the data virtualization options in SQL Server 2025, including:
4+
5+
- PolyBase services
6+
- REST API connectors such as Azure Data Lake Storage, Azure Blob Storage, and Amazon S3-compatible object storage that allow for new access with data virtualization
7+
- Transact-SQL (T-SQL) used for data virtualization, including OPENROWSET, CREATE EXTERNAL TABLE (CET), and CREATE EXTERNAL TABLE AS SELECT (CETAS)
8+
9+
## Principles of data virtualization
10+
11+
Data virtualization relies on three principles:
12+
13+
- **Data abstraction:** Data abstraction hides the complexities of data access from the underlying data system, formats, and structures.
14+
15+
- **Zero replication:** Unlike traditional extract-transform-load (ETL), data virtualization doesn't need to collect the data into a separate repository to transform it to the destination format, but handles transformation and aggregation on the fly.
16+
17+
- **Real-time data:** Because data virtualization connects to the data source on the fly, it always uses the latest available data.
18+
19+
## Benefits of data virtualization
20+
21+
Data virtualization has the following major benefits:
22+
23+
- **No data movement:** Accesses data in its current location.
24+
25+
- **T-SQL language:** Uses all the benefits of the T-SQL language, its commands, enhancements, and familiarity.
26+
27+
- **One source for all your data:** Uses SQL Server 2025 as a single data source and data hub for all required data, hiding data complexity from applications. Database administrators and data engineers can maintain a single environment.
28+
29+
- **Security**: Uses SQL Server security features for granular permissions, credential management, and control.
30+
31+
- **Cost flexibility:** Is available in all SQL Server 2025 editions.
32+
33+
## Data virtualization use cases
34+
35+
SQL Server 2025 offers the following major data virtualization use cases:
36+
37+
- **In-database analytics:** Use and combine all SQL Server capabilities and familiarity when using data virtualization.
38+
- **Offload or export data to other data sources.**
39+
- **Data hub:** Use SQL Server as a centralized hub to connect, protect, and query different data sources and files, hiding the complexity from applications. There's no need to use an ETL tool to aggregate, copy, or move the data to a staging area.
40+
41+
:::image type="content" source="../media/sql-server-data-hub.png" alt-text="Screenshot of SQL Server as a data hub for data virtualization." border="false":::
42+
43+
## Learning objectives
44+
45+
After you complete this module, you:
46+
47+
- Understand the benefits and principles of data virtualization.
48+
- Know what PolyBase is and how to use its capabilities.
49+
- Are familiar with object storage solutions and SQL Server 2025 support for S3-compatible object storage.
50+
- Know how to install and configure PolyBase on SQL Server 2025.
51+
- Know how to access and query external data by using PolyBase in SQL Server 2025.
52+
53+
## Prerequisites
54+
55+
- Basic working knowledge of SQL Server 2025
56+
- Fundamental knowledge of T-SQL and SQL query execution
57+
- SQL Server 2025 installed
58+
- SQL Server Management Studio (SSMS) installed
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
PolyBase is the feature that SQL Server uses to enable the data virtualization concept. PolyBase was originally released in SQL Server 2016 and is improved in each later version of SQL Server. However, the general concept of accessing data remotely without having to copy the data dates from SQL Server 7.0 with the introduction of Linked Server.
2+
3+
The following table lists the first SQL Server version to support various PolyBase features.
4+
5+
|SQL Server 2016|SQL Server 2017|SQL Server 2019|SQL Server 2025|
6+
|-----|-----|-----|-----|
7+
|• Hadoop<br>• Azure Blob Storage|• OPENROWSET enhancements<br>• CSV for Azure Blob Storage<br>• Database Scoped Credential|• SQL Server<br>• Oracle<br>• Azure Cosmos DB<br>• MongoDB<br>• Teradata<br>• Linux support<br>• Generic ODBC|• New connector framework<br>• Object storage integration<br>• CSV<br>• Parquet<br>• Delta<br>• CETAS|
8+
9+
For more information about PolyBase, see [PolyBase features and limitations](/sql/relational-databases/polybase/polybase-versioned-feature-summary).
10+
11+
## PolyBase enhancements in SQL Server 2025
12+
13+
- **Native support for CSV, Parquet, & Delta 1**: PolyBase Query Service for External Data installation is no longer required to use OPENROWSET, CREATE EXTERNAL TABLE, or CREATE EXTERNAL TABLE AS SELECT with the following types of external data: Parquet, Delta, Azure Blob Storage (ABS), Azure Data Lake Storage (ADLS), or S3-Compatible Object storage.
14+
15+
- **Use generic ODBC data sources on Linux**: For more information, see [Configure PolyBase to access external data with ODBC generic types](/sql/relational-databases/polybase/polybase-configure-odbc-generic).
16+
17+
- **TDS 8.0 support**: When using Microsoft ODBC Driver 18 for SQL Server, TDS 8.0 isn't supported for SQL Server as an external data source.
18+
19+
## S3-compatible object storage
20+
21+
SQL Server 2025 supports S3-compatible object storage. To enable this integration, SQL Server 2025 uses a REST API connector framework architecture that follows the S3 framework. Any object storage that supports the S3 framework also works with SQL Server 2025. S3-compatible object storage solutions can run locally, in your network, in the cloud, or in a hybrid environment.
22+
23+
Object storage, also known as object-based storage, is a strategy that manages and manipulates data storage as distinct units, called objects. These objects are kept in a single storehouse and aren't ingrained in files inside other folders. Instead, object storage combines the pieces of data that make up a file, adds all relevant metadata to that file, and attaches a custom identifier.
24+
25+
Some main features of object storage compared to a traditional file system are:
26+
27+
- Keeps metadata embedded in the file.
28+
- Lets files have attributes like tags.
29+
- More cost-effective to scale and easier to maintain.
30+
- Optimized for large amounts of data, such as Big Data, Internet of Things (IoT), AI, Machine Learning, and analytics.
31+
- Not recommended for high-transactional or online transaction processing (OLTP) workloads.
32+
33+
You can also use S3-compatible object storage for backup and restore scenarios by using the BACKUP TO URL command. For more information, see [SQL Server backup and restore with S3-compatible object storage](/sql/relational-databases/backup-restore/sql-server-backup-and-restore-with-s3-compatible-object-storage).
34+
35+
The S3 standard framework is widely adopted, and many major storage providers now offer S3-compatible object storage solutions. If a solution offers compatibility with S3 REST APIs, it's compatible with SQL Server 2025. For a list of supported object storage providers, see [Providers of S3-compatible object storage](/sql/relational-databases/backup-restore/sql-server-backup-and-restore-with-s3-compatible-object-storage#providers-of-s3-compatible-object-storage).
36+
37+
Some object storage partners offer the ability to run their solution as software capable of virtualizing your current storage. You can install and try these solutions on your own machine or virtual machine (VM).
38+
39+
## PolyBase services vs. the PolyBase REST API feature
40+
41+
To use PolyBase, you must install the **PolyBase Query Service for External Data** and enable PolyBase at an instance level by using `sp_configure`. PolyBase setup installs two PolyBase services, **SQL Server PolyBase Engine** and **SQL Server PolyBase Data Movement**.
42+
43+
- **SQL Server PolyBase Engine**
44+
- Service executable: `mpdwsvc.exe -dweng`
45+
- Parses queries.
46+
- Generates query plans.
47+
- Distributes work to compute nodes (SQL Server 2019).
48+
- Processes compute node results and results back to the client (SQL Server 2019).
49+
50+
- **SQL Server PolyBase Data Movement**
51+
- Service executable: `mpdwsvc.exe -dms`
52+
- Transfers data between external data sources and between PolyBase head and compute nodes (SQL Server 2019).
53+
- Inserts data into other data sources, such as Azure Storage.
54+
55+
Data sources like SQL Server, Oracle, MongoDB, or ODBC-based sources use these PolyBase services. Data sources that use the SQL Server 2025 REST API-based PolyBase architecture don't require these services to be running or configured, but the **PolyBase Query Service for External Data** must still be installed and enabled.
56+
57+
You can use the PolyBase REST APIs to access Azure Data Lake Storage, Azure Blob Storage, any S3-compatible object storage, and file formats such as Parquet, Delta, and CSV files. Previously supported data sources still use the **SQL Server PolyBase Engine** and **SQL Server PolyBase Data Movement** services.
58+
59+
|Data source |PolyBase services |PolyBase REST API feature|
60+
|---------|---------|---------|
61+
|Azure Blob Storage |:::image type="content" source="../media/no-icon.svg" border="false" alt-text="No"::: |:::image type="content" source="../media/yes-icon.svg" border="false" alt-text="Yes"::: |
62+
|Azure Data Lake Storage |:::image type="content" source="../media/no-icon.svg" border="false" alt-text="No"::: |:::image type="content" source="../media/yes-icon.svg" border="false" alt-text="Yes"::: |
63+
|S3-compatible object storage |:::image type="content" source="../media/no-icon.svg" border="false" alt-text="No"::: |:::image type="content" source="../media/yes-icon.svg" border="false" alt-text="Yes"::: |
64+
|SQL Server |:::image type="content" source="../media/yes-icon.svg" border="false" alt-text="Yes"::: | :::image type="content" source="../media/no-icon.svg" border="false" alt-text="No"::: |
65+
|Oracle |:::image type="content" source="../media/yes-icon.svg" border="false" alt-text="Yes"::: |:::image type="content" source="../media/no-icon.svg" border="false" alt-text="No"::: |
66+
|Teradata |:::image type="content" source="../media/yes-icon.svg" border="false" alt-text="Yes"::: |:::image type="content" source="../media/no-icon.svg" border="false" alt-text="No"::: |
67+
|MongoDB or Azure Cosmos DB API for MongoDB |:::image type="content" source="../media/yes-icon.svg" border="false" alt-text="Yes"::: |:::image type="content" source="../media/no-icon.svg" border="false" alt-text="No"::: |
68+
|Generic Open Database Connectivity (ODBC) |:::image type="content" source="../media/yes-icon.svg" border="false" alt-text="Yes"::: |:::image type="content" source="../media/no-icon.svg" border="false" alt-text="No"::: |
69+
|Bulk operations |:::image type="content" source="../media/yes-icon.svg" border="false" alt-text="Yes"::: |:::image type="content" source="../media/no-icon.svg" border="false" alt-text="No"::: |

0 commit comments

Comments
 (0)