Skip to content

Commit 7a79983

Browse files
authored
Merge pull request #53188 from MScalopez/postgresql-ai-final
Updated modules 1 and 2 in the AI PostgreSQL learning path
2 parents b88da2c + 47261b7 commit 7a79983

35 files changed

Lines changed: 529 additions & 399 deletions

learn-pr/wwl-data-ai/.openpublishing.redirection.wwl-data-ai.json

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5124,6 +5124,26 @@
51245124
{
51255125
"source_path_from_root": "/learn-pr/wwl-data-ai/scale-ai/3-establish-ai-related-roles-responsibilities.yml",
51265126
"redirect_url": "/training/modules/scale-ai/3-organize-ai-success"
5127+
},
5128+
{
5129+
"source_path_from_root": "/learn-pr/wwl-data-ai/get-started-generative-ai-azure-database-postgresql/6-examine-azure-machine-learning-schema.yml",
5130+
"redirect_url": "/training/modules/get-started-generative-ai-azure-database-postgresql",
5131+
"redirect_document_id": false
5132+
},
5133+
{
5134+
"source_path_from_root": "/learn-pr/wwl-data-ai/get-started-generative-ai-azure-database-postgresql/7-exercise-explore-azure-ai-extension.yml",
5135+
"redirect_url": "/training/modules/get-started-generative-ai-azure-database-postgresql",
5136+
"redirect_document_id": false
5137+
},
5138+
{
5139+
"source_path_from_root": "/learn-pr/wwl-data-ai/get-started-generative-ai-azure-database-postgresql/8-knowledge-check.yml",
5140+
"redirect_url": "/training/modules/get-started-generative-ai-azure-database-postgresql",
5141+
"redirect_document_id": false
5142+
},
5143+
{
5144+
"source_path_from_root": "/learn-pr/wwl-data-ai/get-started-generative-ai-azure-database-postgresql/9-summary.yml",
5145+
"redirect_url": "/training/modules/get-started-generative-ai-azure-database-postgresql",
5146+
"redirect_document_id": false
51275147
}
51285148
]
51295149
}

learn-pr/wwl-data-ai/enable-semantic-search-azure-database-postgresql/8-knowledge-check.yml

Lines changed: 8 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,14 @@
22
uid: learn.wwl.enable-semantic-search-azure-database-postgresql.knowledge-check
33
title: Module assessment
44
metadata:
5-
adobe-target: true
6-
prefetch-feature-rollout: true
75
title: Module assessment
86
description: "Knowledge check"
9-
ms.date: 04/23/2025
7+
ms.date: 10/24/2025
108
author: wwlpublish
119
ms.author: calopez
1210
ms.topic: unit
13-
ms.custom:
14-
- N/A
1511
module_assessment: true
1612
durationInMinutes: 3
17-
content: |
18-
[!include[](includes/8-knowledge-check.md)]
1913
quiz:
2014
title: "Check your knowledge"
2115
questions:
@@ -26,7 +20,7 @@ quiz:
2620
explanation: "Correct. Semantic search uses numeric vector distance to measure semantic distance. A vector of definition or topic words is like lexical search (augmenting a query with synonyms or searching by tag or topic)."
2721
- content: "An array of n words that summarize the text's meaning."
2822
isCorrect: false
29-
explanation: "Incorrect. Semantic search uses a quantitative representation of text meaning derived from a language model, not synonyms or definitions. The core of semantic search is to represent semantics quantitatively so that normal vector operations can be used to measure semantic distance."
23+
explanation: "Incorrect. Semantic search uses a quantitative representation of text meaning derived from a language model, not synonyms, or definitions. The core of semantic search is to represent semantics quantitatively so that normal vector operations can be used to measure semantic distance."
3024
- content: "An array of n text strings embedded in the text."
3125
isCorrect: false
3226
explanation: "Incorrect. Semantic search uses a quantitative representation of text meaning derived from a language model, not a list of ideas or topics. The core of semantic search is to represent semantics quantitatively so that normal vector operations can be used to measure semantic distance."
@@ -37,18 +31,18 @@ quiz:
3731
explanation: "Correct. PostgreSQL is a suitable storage layer for vectors with the `vector` extension installed. It doesn't require new services or data migration."
3832
- content: "Use Vector Database in Azure Cosmos DB for MongoDB."
3933
isCorrect: false
40-
explanation: "Incorrect. While the Vector Database in Azure Cosmos DB for MongoDB is a good choice for storing & querying vectors, it requires deploying & maintaining a separate service and performing ETL between the application database and Cosmos DB. The most straightforward option is to use the `vector` extension to handle vectors directly in the PostgreSQL database."
34+
explanation: "Incorrect. While the Vector Database in Azure Cosmos DB for MongoDB is a good choice for storing & querying vectors, it requires deploying & maintaining a separate service and performing ETL (Extract, Transform, Load) between the application database and Cosmos DB. The most straightforward option is to use the `vector` extension to handle vectors directly in the PostgreSQL database."
4135
- content: "Use Azure AI Search's vector store."
4236
isCorrect: false
4337
explanation: "Incorrect. While Azure AI Search's vector store is a good choice for storing & querying vectors, it requires deploying a separate service and performing ETL between the application database and Azure AI Search. The most straightforward choice is to use the `vector` extension to store vectors directly in the PostgreSQL database."
44-
- content: "An application has stored embedding vectors in a PostgreSQL flexible server database and is ready to query them. The user has supplied a query string. What is the simplest way to run a semantic search?"
38+
- content: "An application stores embedding vectors in a PostgreSQL flexible server database and is ready to query them. The user supplies a query string. What is the simplest way to run a semantic search?"
4539
choices:
4640
- content: "The application calls a stored function to return ranked results."
4741
isCorrect: true
4842
explanation: "Correct. This approach requires minimal changes to the application code and encapsulates concepts like embedding vectors and cosine distance to application code."
49-
- content: "Use Azure OpenAI Embeddings API in the application, and use the result as a query parameter to rank cosine distance."
43+
- content: "To rank cosine distance, use Azure OpenAI Embeddings API in the application, and use the result as a query parameter."
5044
isCorrect: false
51-
explanation: "Incorrect. While this would work, it isn't the simplest approach: it introduces new services to applications and requires application developers to understand at least the basics of vector search."
52-
- content: "Use Azure AI Search's integrated vectorization to generate the query embedding and use the SQL in-line."
45+
explanation: "Incorrect. While this approach would work, it isn't the simplest approach: it introduces new services to applications and requires application developers to understand at least the basics of vector search."
46+
- content: "To generate the query embedding and use the SQL in-line, use Azure AI Search's integrated vectorization."
5347
isCorrect: false
54-
explanation: "Incorrect. While this is a viable approach to running semantic search with Azure AI Search, it isn't the simplest approach for data already stored in a PostgreSQL flexible server."
48+
explanation: "Incorrect. While this approach is viable to running semantic search with Azure AI Search, it isn't the simplest approach for data already stored in a PostgreSQL flexible server."
Lines changed: 8 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,11 @@
1-
Semantic search augments standard keyword search with semantic similarity. This similarity means a query for "sunny" could match the text "bright natural light" even though there's no lexical overlap longer than one letter. Instead of character similarity, semantic search uses embedding vectors produced by artificial intelligence (AI) to measure query and document similarity, providing more relevant search results.
1+
Semantic search augments standard keyword search by using semantic similarity rather than exact word matches. Instead of relying on overlapping terms, it uses embedding vectors generated by AI models to measure how similar a query and a document are in meaning. In this module, you enable semantic search in Azure Database for PostgreSQL flexible server by combining the `pgvector` and `azure_ai` extensions with Azure OpenAI embeddings so that a query like "sunny" can match descriptions such as "bright natural light" even when the words don’t match exactly. You work with a vacation property listings scenario to see how semantic search can deliver more relevant results with less manual keyword tuning.
22

3-
This module shows how to enable semantic search in Azure Database for PostgreSQL flexible server and how to use Azure OpenAI to generate vector embeddings.
3+
Imagine you work on the Margie’s Travel team, building web, and mobile apps that help travelers find vacation rental properties. Today, your search relies on simple keywords like "pool" or "pet-friendly," which often miss great listings because the wording in descriptions doesn’t match the user’s query. You want to deliver more relevant results without constantly curating keyword lists. In this module, you use the Margie’s Travel property listings to see how enabling semantic search in Azure Database for PostgreSQL can surface the right properties based on meaning, not just exact words.
44

5-
![Diagram of an Azure Database with the vector and azure_ai extensions.](../media/overview.png)
6-
7-
## Scenario
8-
9-
Suppose you work at a company that manages vacation property listings. You want to let customers search and book listings online. One challenge is the many different words people use to describe the same thing. You have limited resources to develop and maintain keyword lists as descriptions change and properties come and go, and manual keyword entry is error-prone. You want to provide relevant search results without manual keyword lists.
10-
11-
## Learning objectives
12-
13-
You get an overview of semantic search, embeddings, and vector databases. Then, you enable the `pgvector` and `azure_ai` extensions. With these extensions, you'll execute a semantic search over vector columns generated from Azure OpenAI embeddings using the `azure_ai` extension. Lastly, you write a search function that receives a query string, generates embeddings for that query, and executes a semantic search against the database.
14-
15-
By the end of this session, you're able to execute semantic searches using an Azure Database for PostgreSQL flexible server database against vector embeddings generated by Azure OpenAI.
5+
By the end of this module, you're able to:
166

7+
- Describe how semantic search differs from traditional keyword search and why embeddings improve result relevance.
8+
- Enable the `pgvector` and `azure_ai` extensions on an Azure Database for PostgreSQL flexible server instance.
9+
- Use Azure OpenAI, through the `azure_ai` extension, to generate and store vector embeddings for text data.
10+
- Execute semantic search queries over vector columns in PostgreSQL using Azure OpenAI embeddings.
11+
- Implement a search function that accepts a query string, generates an embedding for the query, and runs a semantic search against vacation property listings.

learn-pr/wwl-data-ai/enable-semantic-search-azure-database-postgresql/includes/2-understand-semantic-search.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,17 +21,23 @@ Most relational database use cases don't involve storing *n*-dimensional vectors
2121

2222
![A diagram showing a document and a query going through the OpenAI Embeddings API to become embedding vectors. These vectors are then compared using cosine distance.](../media/cosine-distance.png)
2323

24+
## Vectors
25+
26+
Before we talk about embeddings, it helps to understand what a **vector** is in this context. A vector is just an ordered list of numbers, often written as an array like `[0.12, -0.8, 3.4]`. You can think of a vector as a point or an arrow in an *n*-dimensional space, where each number is a coordinate along one dimension.
27+
28+
In two dimensions, a vector might represent a position on a flat map (x, y). In three dimensions, it could represent a point in physical space (x, y, z). For embeddings, we extend this idea to hundreds or thousands of dimensions. Each dimension encodes some learned property or feature, and together those numbers capture information about the original data. Once text, images, or other data are turned into vectors, you can compare them mathematically to measure how similar or different they are.
29+
2430
## Embeddings
2531

26-
An **embedding** is a numerical representation of semantics. Embeddings are represented as *n*-dimensional vectors: arrays of *n* numbers. Each dimension represents some semantic quality as determined by the embedding model.
32+
An **embedding** is a specific type of vector that represents semantics. Embeddings are represented as *n*-dimensional vectors: arrays of *n* numbers. Each dimension represents some semantic quality as determined by the embedding model.
2733

2834
![A diagram showing "lorem ipsum" input text being sent to the Azure OpenAI embeddings API, resulting in a vector array of numbers.](../media/create-embedding.png)
2935

3036
If two embedding vectors point in similar directions, they represent similar concepts, such as "bright" and "sunny." If they point away from each other, they represent opposite concepts, such as "sad" and "happy." The embedding model structure and training data determine what is considered similar and different.
3137

3238
Embeddings can be applied to text and any kind of data, such as images or audio. The critical part is transforming data into *n*-dimensional embedding vectors based on some model or function. The numerical similarity of embeddings proxies the semantic similarity of their corresponding data.
3339

34-
The numerical similarity of two *n*-dimensional vectors `v1` and `v2` is given by their [dot product](https://en.wikipedia.org/wiki/Dot_product), written `v1·v2`. To compute the dot product, multiply each dimension's values pair-wise, then sum the result:
40+
The numerical similarity of two *n*-dimensional vectors `v1` and `v2` gives their [dot product](https://en.wikipedia.org/wiki/Dot_product), written `v1·v2`. To compute the dot product, multiply each dimension's values pair-wise, then sum the result:
3541

3642
```sql
3743
dot_product(v1, v2) = SUM(
@@ -45,7 +51,7 @@ dot_product(v1, v2) = SUM(
4551

4652
Because the embeddings are unit vectors (vectors of length one), the dot product is equal to the vectors' [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity), a value between -1 (precisely opposite directions) and 1 (exactly the same direction). Vectors with a cosine similarity of zero are orthogonal: semantically unrelated.
4753

48-
You can visualize *n*-dimensional spaces by projecting them to 3-dimensional space using [principal component analysis](https://en.wikipedia.org/wiki/Principal_component_analysis) (PCA). PCA is a standard technique to reduce vector dimensions. The result is a simplified but visualizable projection of the *n*-dimensional space. Rendering your document embeddings this way will show that more similar documents are grouped in clusters while more different documents are further away.
54+
You can visualize *n*-dimensional spaces by projecting them to three-dimensional space using [principal component analysis](https://en.wikipedia.org/wiki/Principal_component_analysis) (PCA). PCA is a standard technique to reduce vector dimensions. The result is a simplified but visualizable projection of the *n*-dimensional space. Rendering your document embeddings this way shows that more similar documents are grouped in clusters while more different documents are further away.
4955

5056
Given these definitions, performing a semantic search of a query against a collection of document embeddings is straightforward mathematically:
5157

@@ -54,7 +60,7 @@ Given these definitions, performing a semantic search of a query against a colle
5460
1. Sort the dot products, numbers from -1 to 1.
5561
1. The most relevant (semantically similar) documents have the highest scores, and the least relevant (semantically different) documents have the lowest scores.
5662

57-
While simple mathematically, this isn't a simple or performant query in a relational database. To store and process this kind of vector similarity query, use a **vector database**.
63+
While simple mathematically, this solution isn't a simple or performant query in a relational database. To store and process this kind of vector similarity query, use a **vector database**.
5864

5965
## Vector databases
6066

learn-pr/wwl-data-ai/enable-semantic-search-azure-database-postgresql/includes/3-store-vectors-azure-database-postgresql-flexible-server.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ Recall that you require embedding vectors stored in a vector database to run a s
22

33
![Diagram of an Azure Database for PostgreSQL flexible server and the extension named "vector." Next to it are four stored vectors with n-dimensions and arbitrary numeric values.](../media/store-vectors.png)
44

5-
## Introduction to `vector`
5+
## Introduction to `pgvector`
66

7-
The open-source [`vector` extension](https://github.com/pgvector/pgvector) provides vector storage, similarity querying, and other vector operations for PostgreSQL. Once enabled, you can create `vector` columns to store embeddings (or other vectors) alongside other columns.
7+
The open-source [`pgvector` extension](https://github.com/pgvector/pgvector) provides vector storage, similarity querying, and other vector operations for PostgreSQL. Once enabled, you can create `vector` columns to store embeddings (or other vectors) alongside other columns.
88

99
```sql
1010
/* Enable the extension. */
@@ -75,7 +75,7 @@ UPDATE documents SET embedding = '[1,1,1]' where id = 1;
7575

7676
The `vector` extension provides the `v1 <=> v2` operator to calculate the cosine distance between vectors `v1` and `v2`. The result is a number between 0 and 2, where 0 means "semantically identical" (no distance) and two means "semantically opposite" (maximum distance).
7777

78-
You can see the terms cosine **distance** and **similarity**. Recall that cosine similarity is between -1 and 1, where -1 means "semantically opposite" and 1 means "semantically identical." Note that `similarity = 1 - distance`.
78+
You can see the terms cosine **distance** and **similarity**. Recall that cosine similarity is between -1 and 1, where -1 means "semantically opposite" and 1 means "semantically identical." So, `similarity = 1 - distance`.
7979

8080
The upshot is that a query ordered by distance ascending returns the least distant (most similar) results first, while a query ordered by similarity descending returns the most similar (least distant) results first.
8181

learn-pr/wwl-data-ai/enable-semantic-search-azure-database-postgresql/includes/4-create-embeddings-with-azure-ai-extension.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ To run a semantic search, you must compare the query embedding with the embeddin
44

55
## Introduction to `azure_ai` and Azure OpenAI
66

7-
The [Azure Database for PostgreSQL flexible extension for Azure AI](/azure/postgresql/flexible-server/generative-ai-azure-overview) provides user-defined functions to integrate with Microsoft Foundry including [Azure OpenAI](/azure/ai-services/openai/overview) and [Azure AI Search](https://azure.microsoft.com/products/ai-services/cognitive-search/).
7+
The [Azure Database for PostgreSQL flexible extension for Azure AI](/azure/postgresql/flexible-server/generative-ai-azure-overview) provides user-defined functions to integrate with Azure AI services including [Azure OpenAI](/azure/ai-services/openai/overview) and [Azure Cognitive Services](https://azure.microsoft.com/products/ai-services/cognitive-search/).
88

99
The [Azure OpenAI Embeddings API](/azure/ai-services/openai/reference#embeddings) generates an embedding vector of the input text. Use this API to set the embeddings for all items being searched. The `azure_ai` extension's `azure_openai` schema makes it easy to call the API from SQL to generate embeddings, whether to initialize item embeddings or create a query embedding on the fly. These embeddings can then be used to perform vector similarity search, or in other words, semantic search.
1010

@@ -26,7 +26,7 @@ SELECT azure_ai.set_setting('azure_openai.endpoint', '{your-endpoint-url}');
2626
SELECT azure_ai.set_setting('azure_openai.subscription_key', '{your-api-key}}');
2727
```
2828

29-
Once `azure_ai` and Azure OpenAI are configured, fetching and storing embeddings is a simple matter of calling a function in the SQL query. Assuming a table `listings` with a `description` column and a `listing_vector` column, you can generate and store the embedding for all listings with the following query. Replace `{your-deployment-name}` with the **Deployment name** from the Azure OpenAI Studio for the model you created.
29+
Once `azure_ai` and Azure OpenAI are configured, fetching, and storing embeddings is a simple matter of calling a function in the SQL query. Assuming a table `listings` with a `description` column and a `listing_vector` column, you can generate and store the embedding for all listings with the following query. Replace `{your-deployment-name}` with the **Deployment name** from the Azure OpenAI Studio for the model you created.
3030

3131
```sql
3232
UPDATE listings

0 commit comments

Comments
 (0)