Skip to content

Commit 053c614

Browse files
Merge pull request #53612 from DivyaGundreddy/LP160474-M4
transform-development-workflows-sql-server-2025
2 parents b9d29a8 + e01887b commit 053c614

16 files changed

Lines changed: 834 additions & 0 deletions
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-sql-server-t-sql-enhancements.introduction
3+
title: Introduction
4+
metadata:
5+
title: Introduction
6+
description: "Introduction"
7+
ms.date: 10/14/2025
8+
author: MScalopez
9+
ms.author: calopez
10+
ms.topic: unit
11+
durationInMinutes: 1
12+
content: |
13+
[!include[](includes/01-introduction.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-sql-server-t-sql-enhancements.vector-ai-integration
3+
title: AI and vector integration
4+
metadata:
5+
title: AI and vector integration
6+
description: "Explore AI and vector features for embeddings, search, and external model integration."
7+
ms.date: 10/14/2025
8+
author: MScalopez
9+
ms.author: calopez
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
content: |
13+
[!include[](includes/02-vector-ai-integration.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-sql-server-t-sql-enhancements.pattern-matching-text-extraction
3+
title: Pattern matching and text extraction
4+
metadata:
5+
title: Pattern matching and text extraction
6+
description: "Use REGEXP and SUBSTRING to find, extract, and manipulate text patterns in T-SQL."
7+
ms.date: 10/14/2025
8+
author: MScalopez
9+
ms.author: calopez
10+
ms.topic: unit
11+
durationInMinutes: 7
12+
content: |
13+
[!include[](includes/03-pattern-matching-text.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-sql-server-t-sql-enhancements.json-string-aggregation
3+
title: JSON and string aggregation
4+
metadata:
5+
title: JSON and string aggregation
6+
description: "Create JSON arrays, objects, and delimited strings with new T-SQL aggregation functions."
7+
ms.date: 10/14/2025
8+
author: MScalopez
9+
ms.author: calopez
10+
ms.topic: unit
11+
durationInMinutes: 6
12+
content: |
13+
[!include[](includes/04-json-string-aggregation.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-sql-server-t-sql-enhancements.encoding-similarity
3+
title: Encoding and similarity functions
4+
metadata:
5+
title: Encoding and similarity functions
6+
description: "Encode data with Base64 and compare text using new string similarity functions."
7+
ms.date: 10/14/2025
8+
author: MScalopez
9+
ms.author: calopez
10+
ms.topic: unit
11+
durationInMinutes: 6
12+
content: |
13+
[!include[](includes/05-encoding-similarity.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-sql-server-t-sql-enhancements.date-numeric-enhancements
3+
title: Date and numeric enhancements
4+
metadata:
5+
title: Date and numeric enhancements
6+
description: "Work with CURRENT_DATE, bigint DATEADD, and PRODUCT() for precise date and math operations."
7+
ms.date: 10/14/2025
8+
author: MScalopez
9+
ms.author: calopez
10+
ms.topic: unit
11+
durationInMinutes: 6
12+
content: |
13+
[!include[](includes/06-date-numeric-enhancements.md)]
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-sql-server-t-sql-enhancements.knowledge-check
3+
title: Module assessment
4+
metadata:
5+
title: Module assessment
6+
description: "Knowledge check"
7+
ms.date: 10/14/2025
8+
author: MScalopez
9+
ms.author: calopez
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
quiz:
13+
title: Check your knowledge
14+
questions:
15+
- content: Descriptions contain codes like AB12345 embedded in text. The result must return the matched code text per row. Which option fits?
16+
choices:
17+
- content: REGEXP_SUBSTR
18+
isCorrect: true
19+
explanation: Returns the substring that matches the pattern, producing the code text directly.
20+
- content: REGEXP_INSTR
21+
isCorrect: false
22+
explanation: Returns the starting position of the match, not the matched text.
23+
- content: REGEXP_MATCHES
24+
isCorrect: false
25+
explanation: Returns all matches as a rowset; use when multiple rows of matches are needed.
26+
- content: A report needs one JSON field per customer listing all product IDs in order. Which option aligns with that output?
27+
choices:
28+
- content: JSON_OBJECTAGG
29+
isCorrect: false
30+
explanation: Produces key-value objects, not arrays of values.
31+
- content: STRING_CONCAT_WS
32+
isCorrect: false
33+
explanation: Creates delimited text, not structured JSON.
34+
- content: JSON_ARRAYAGG
35+
isCorrect: true
36+
explanation: Aggregates values into a JSON array while preserving order.
37+
- content: Similarity scores fluctuate across rows because vector magnitudes vary. The pipeline must make scores comparable across rows. What helps?
38+
choices:
39+
- content: VECTOR_NORMALIZE
40+
isCorrect: true
41+
explanation: Normalizes each vector to unit length so similarity is magnitude-invariant.
42+
- content: VECTOR_DISTANCE
43+
isCorrect: false
44+
explanation: Computes distance; doesn't standardize magnitude.
45+
- content: CREATE VECTOR INDEX
46+
isCorrect: false
47+
explanation: Speeds up search but doesn’t change score scaling.
48+
- content: A nightly job must store the current date as a partition key with no time component. Which choice avoids manual truncation?
49+
choices:
50+
- content: GETDATE
51+
isCorrect: false
52+
explanation: Includes time, which complicates partition matching.
53+
- content: CURRENT_DATE
54+
isCorrect: true
55+
explanation: Returns the date only, suitable for date-based partitions.
56+
- content: SYSDATETIME
57+
isCorrect: false
58+
explanation: Returns higher-precision datetime, not date-onl
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.introduction-sql-server-t-sql-enhancements.summary
3+
title: Summary
4+
metadata:
5+
title: Summary
6+
description: "Summary"
7+
ms.date: 10/14/2025
8+
author: MScalopez
9+
ms.author: calopez
10+
ms.topic: unit
11+
durationInMinutes: 1
12+
content: |
13+
[!include[](includes/08-summary.md)]
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
SQL Server 2025 introduces a range of new T‑SQL features and enhancements that support modern workloads while keeping queries clear and maintainable. This module focuses on language additions for AI and vectors, pattern matching, JSON output, string processing, and improved date and numeric operations.
2+
3+
In this module, we cover the following topics:
4+
5+
- **Vector and AI integration**: Learn how to generate embeddings and work with vector data using functions and features such as `AI_GENERATE_EMBEDDINGS`, `AI_GENERATE_CHUNKS`, `VECTOR_DISTANCE`, `VECTOR_NORM`, `VECTOR_NORMALIZE`, `VECTORPROPERTY`, `CREATE EXTERNAL MODEL`, `CREATE VECTOR INDEX`, and `VECTOR_SEARCH`.
6+
- **Pattern matching and text extraction**: Use `REGEXP_LIKE`, `REGEXP_SUBSTR`, `REGEXP_REPLACE`, `REGEXP_INSTR`, `REGEXP_COUNT`, `REGEXP_MATCHES`, and `REGEXP_SPLIT_TO_TABLE`, plus the enhanced `SUBSTRING` behavior.
7+
- **JSON and string aggregation**: Build structured output with `JSON_ARRAYAGG` and `JSON_OBJECTAGG`, and create delimited text with `STRING_CONCAT_WS`. You can also use the `||` operator for string concatenation and `UNISTR` for Unicode escape sequences.
8+
- **Encoding and similarity functions**: Encode and decode text with `BASE64_ENCODE` and `BASE64_DECODE`, and compare strings with `STRING_SIMILARITY`, `EDIT_DISTANCE`, `EDIT_DISTANCE_SIMILARITY`, `JARO_WINKLER_DISTANCE`, and `JARO_WINKLER_SIMILARITY`.
9+
- **Date and numeric enhancements**: Work with `CURRENT_DATE`, `DATEADD` with `bigint`, and the `PRODUCT()` aggregate for multiplicative calculations.
10+
11+
## Learning objectives
12+
13+
Upon completing this module, you should be able to:
14+
15+
- Understand the new and enhanced T‑SQL features in SQL Server 2025.
16+
- Apply these capabilities to integrate AI, parse and format text, build JSON output, and support analytics.
17+
- Choose the right function or operator to keep queries readable and efficient.
18+
19+
## Prerequisites
20+
21+
- SQL Server 2025
22+
- Basic working knowledge of SQL Server and query processing
23+
- Fundamental knowledge of Transact‑SQL (T‑SQL)
24+
- Familiarity with functions, operators, and JSON handling in SQL Server
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
SQL Server 2025 introduces a new set of AI and vector functions that enable database developers to integrate AI-powered capabilities directly into T-SQL. These new capabilities make it possible to generate embeddings, calculate vector similarity, and search across AI-enriched data without leaving SQL Server. This level of integration reduces the need for external services, simplifies application architecture, and supports real-time intelligent workloads.
2+
3+
## AI and Vector Functions Overview
4+
5+
The new AI features in SQL Server 2025 fall into three main categories: AI generation, vector operations, and vector indexing and search.
6+
7+
### AI Generation Functions
8+
9+
- **AI_GENERATE_CHUNKS** – Splits large text or documents into semantically coherent chunks that can later be embedded or stored for retrieval-augmented generation (RAG) scenarios.
10+
- **AI_GENERATE_EMBEDDINGS** – Generates embeddings from text input using an external model registered in SQL Server. These embeddings can be stored in tables for use in vector search, similarity analysis, or semantic ranking.
11+
12+
### Vector Operations
13+
14+
- **VECTOR_DISTANCE** – Computes the distance between two vector values, supporting distance metrics such as cosine, Euclidean, and dot product.
15+
- **VECTOR_NORM** – Returns the vector norm (magnitude) for a given vector.
16+
- **VECTOR_NORMALIZE** – Returns a normalized version of a vector, typically used before comparison or similarity searches.
17+
- **VECTORPROPERTY** – Returns metadata about a vector, such as its dimensions or element type.
18+
19+
### External Models and Vector Indexes
20+
21+
SQL Server 2025 allows you to register and manage external AI models using T-SQL.
22+
- **CREATE EXTERNAL MODEL / ALTER EXTERNAL MODEL / DROP EXTERNAL MODEL** – Manage AI models that are hosted locally or through supported model providers.
23+
- **CREATE VECTOR INDEX** – Creates an index optimized for vector data to accelerate similarity searches.
24+
- **VECTOR_SEARCH** – Performs similarity search operations on vector data using the vector index, returning the closest matches based on the selected distance metric.
25+
26+
These capabilities allow SQL Server to serve as a foundation for retrieval-augmented generation, recommendation engines, and semantic search applications entirely within the database engine.
27+
28+
### Half-precision vector storage and binary ingest
29+
30+
Vectors can now use **half-precision floating-point (fp16)** elements to reduce memory usage and improve scan performance in embedding-heavy workloads.
31+
You can also **bulk-load vectors** in binary format using `BULK INSERT` or `OPENROWSET(BULK ...)`, which simplifies importing large embedding sets created outside SQL Server.
32+
33+
## Example Scenario: Building a Product Recommendation Query
34+
35+
Imagine you work for a retail company that stores product descriptions in a SQL Server 2025 database. The marketing team wants to build a recommendation feature that suggests products semantically similar to a selected item. Using the new AI and vector features, you can generate embeddings for product descriptions, store them in a table, and perform similarity searches without external processing.
36+
37+
### Create and Register the Model
38+
39+
Before generating embeddings, you must register an external model.
40+
41+
```sql
42+
CREATE EXTERNAL MODEL embedding_model
43+
FROM OPENAI
44+
WITH (ENDPOINT = 'https://api.openai.com/v1/embeddings',
45+
API_KEY = SECRET('openai_key'),
46+
MODEL_NAME = 'text-embedding-3-small');
47+
```
48+
49+
### Generate and Store Embeddings
50+
51+
Once the model is registered, you can generate embeddings for your product descriptions and store them in a new table.
52+
53+
```sql
54+
CREATE TABLE ProductEmbeddings
55+
(
56+
ProductID INT PRIMARY KEY,
57+
Description NVARCHAR(MAX),
58+
Embedding VECTOR(1536)
59+
);
60+
61+
INSERT INTO ProductEmbeddings (ProductID, Description, Embedding)
62+
SELECT ProductID,
63+
Description,
64+
AI_GENERATE_EMBEDDINGS('embedding_model', Description)
65+
FROM Products;
66+
```
67+
68+
### Create a Vector Index and Run a Search
69+
70+
To improve search performance, create a vector index to speed up similarity searches.
71+
72+
```sql
73+
CREATE VECTOR INDEX idx_ProductEmbedding
74+
ON ProductEmbeddings (Embedding)
75+
WITH (DISTANCE_METRIC = 'cosine');
76+
```
77+
78+
Now you can perform a semantic search for related products:
79+
80+
```sql
81+
DECLARE @query NVARCHAR(MAX) = 'waterproof hiking backpack';
82+
DECLARE @vector VECTOR(1536) = AI_GENERATE_EMBEDDINGS('embedding_model', @query);
83+
84+
SELECT TOP 5 ProductID, Description,
85+
VECTOR_DISTANCE(Embedding, @vector, 'cosine') AS SimilarityScore
86+
FROM ProductEmbeddings
87+
ORDER BY SimilarityScore ASC;
88+
```
89+
90+
### Results
91+
92+
| ProductID | Description | SimilarityScore |
93+
|------------|--------------|----------------|
94+
| 105 | "Lightweight waterproof travel backpack" | 0.07 |
95+
| 116 | "Hiking pack with rain cover and hydration slot" | 0.10 |
96+
| 117 | "Compact outdoor day pack with water resistance" | 0.12 |
97+
| 101 | "Trail-ready backpack with external straps" | 0.15 |
98+
| 119 | "Travel and camping waterproof duffel" | 0.18 |
99+
100+
This example demonstrates how to integrate an external AI model, generate embeddings directly within T-SQL, and perform a similarity search using built-in vector functions. Everything runs inside SQL Server, which simplifies development and allows intelligent workloads to remain secure and governed under existing database policies.
101+
102+
## Summary
103+
104+
SQL Server 2025 introduces native AI capabilities that allow developers to build intelligent database applications directly in T-SQL. Functions such as `AI_GENERATE_EMBEDDINGS`, `VECTOR_DISTANCE`, and `VECTOR_SEARCH` streamline integration with AI models while maintaining performance and security. Together, these features make SQL Server 2025 a strong platform for semantic search, recommendations, and context-aware analytics without relying on external compute pipelines.

0 commit comments

Comments
 (0)