Skip to content

Commit 156cd14

Browse files
Merge pull request #53380 from JeffKoMS/build-query-azure-cosmos-db
Added files for new module covering querying Cosmos DB
2 parents 5edf790 + f162f68 commit 156cd14

15 files changed

Lines changed: 894 additions & 0 deletions
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.build-query-azure-cosmos-db.introduction
3+
title: Introduction
4+
metadata:
5+
title: Introduction
6+
description: Introduction
7+
ms.date: 02/05/2026
8+
author: jeffkoms
9+
ms.author: jeffko
10+
ms.topic: unit
11+
durationInMinutes: 3
12+
content: |
13+
[!include[](includes/1-introduction.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.build-query-azure-cosmos-db.explore-cosmos-db-nosql
3+
title: Explore Azure Cosmos DB for NoSQL
4+
metadata:
5+
title: Explore Azure Cosmos DB for NoSQL
6+
description: Explore Azure Cosmos DB for NoSQL
7+
ms.date: 02/05/2026
8+
author: jeffkoms
9+
ms.author: jeffko
10+
ms.topic: unit
11+
durationInMinutes: 10
12+
content: |
13+
[!include[](includes/2-explore-cosmos-db-nosql.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.build-query-azure-cosmos-db.implement-cosmos-db-sdk
3+
title: Implement the Azure Cosmos DB for NoSQL SDK
4+
metadata:
5+
title: Implement the Azure Cosmos DB for NoSQL SDK
6+
description: Implement the Azure Cosmos DB for NoSQL SDK
7+
ms.date: 02/05/2026
8+
author: jeffkoms
9+
ms.author: jeffko
10+
ms.topic: unit
11+
durationInMinutes: 12
12+
content: |
13+
[!include[](includes/3-implement-cosmos-db-sdk.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.build-query-azure-cosmos-db.query-cosmos-db-nosql
3+
title: Query Azure Cosmos DB for NoSQL
4+
metadata:
5+
title: Query Azure Cosmos DB for NoSQL
6+
description: Query Azure Cosmos DB for NoSQL
7+
ms.date: 02/05/2026
8+
author: jeffkoms
9+
ms.author: jeffko
10+
ms.topic: unit
11+
durationInMinutes: 10
12+
content: |
13+
[!include[](includes/4-query-cosmos-db-nosql.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.build-query-azure-cosmos-db.exercise-build-rag-document-store
3+
title: Exercise - Build a RAG document store on Azure Cosmos DB for NoSQL
4+
metadata:
5+
title: Exercise - Build a RAG Document Store on Azure Cosmos DB for NoSQL
6+
description: Exercise - Build a RAG document store on Azure Cosmos DB for NoSQL
7+
ms.date: 02/05/2026
8+
author: jeffkoms
9+
ms.author: jeffko
10+
ms.topic: unit
11+
durationInMinutes: 30
12+
content: |
13+
[!include[](includes/5-exercise-build-rag-document-store.md)]
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.build-query-azure-cosmos-db.knowledge-check
3+
title: Module assessment
4+
metadata:
5+
title: Module Assessment
6+
description: Module assessment
7+
ms.date: 02/05/2026
8+
author: jeffkoms
9+
ms.author: jeffko
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
content: "Choose the best response for each of the following questions."
13+
quiz:
14+
questions:
15+
- content: "A developer is designing a container for an AI application that stores user interaction logs. Each document includes a userId property. The application frequently retrieves all logs for a specific user. Which partition key selection provides the best performance for this access pattern?"
16+
choices:
17+
- content: "Use userId as the partition key"
18+
isCorrect: true
19+
explanation: "Using userId as the partition key groups all documents for a single user together in the same logical partition. This enables single-partition queries when filtering by user, providing the lowest latency and RU cost for the application's primary access pattern."
20+
- content: "Use a timestamp property as the partition key"
21+
isCorrect: false
22+
explanation: "Using timestamp creates partitions based on time, which doesn't align with the access pattern of retrieving logs by user. This would require cross-partition queries to gather all logs for a specific user, increasing latency and RU consumption."
23+
- content: "Use a boolean isProcessed property as the partition key"
24+
isCorrect: false
25+
explanation: "Using a boolean property creates only two logical partitions, severely limiting scalability. This low-cardinality partition key creates hot partitions and doesn't support efficient queries by user."
26+
- content: "An AI application caches model inference results in Azure Cosmos DB. The application periodically recomputes results and needs to store them regardless of whether a cached entry already exists. Which SDK method handles this requirement most effectively?"
27+
choices:
28+
- content: "Use create_item() to insert the item"
29+
isCorrect: false
30+
explanation: "The create_item() method fails with a conflict error (HTTP 409) if an item with the same ID and partition key already exists. For a caching scenario where entries might already exist, create_item() would require additional error handling to detect and handle duplicates."
31+
- content: "Use replace_item() to update the item"
32+
isCorrect: false
33+
explanation: "The replace_item() method requires the item to already exist and fails if it doesn't. For a caching scenario where the entry might not yet exist, replace_item() would fail on the first write for any new cache key."
34+
- content: "Use upsert_item() to insert or replace the item"
35+
isCorrect: true
36+
explanation: "The upsert_item() method inserts a new item if it doesn't exist or replaces the existing item with the same ID and partition key. This simplifies update logic when you don't need to know whether the item existed previously, making it ideal for caching scenarios where results are periodically recomputed."
37+
- content: "An AI application stores product recommendations with document IDs in the format product-{id}. The application needs to retrieve a specific recommendation by its known ID and category. Which method provides the most efficient retrieval?"
38+
choices:
39+
- content: "Use query_items() with a WHERE clause filtering by ID"
40+
isCorrect: false
41+
explanation: "While queries can retrieve items by ID, they consume more RUs than point reads because Azure Cosmos DB must parse and execute the query. Point reads are more efficient when you have both the ID and partition key."
42+
- content: "Use read_item() with the item ID and partition key"
43+
isCorrect: true
44+
explanation: "Point reads using read_item() retrieve a single item by ID and partition key with the lowest possible latency and RU cost (approximately 1 RU for a 1-KB item). This is the most efficient method when you know both values."
45+
- content: "Use query_items() with enable_cross_partition_query=True"
46+
isCorrect: false
47+
explanation: "Cross-partition queries fan out to all partitions, consuming more RUs than necessary. When you know the partition key, always use it to route requests to a single partition or use a point read."
48+
- content: "A developer is building a search feature that accepts user-provided filter values. The feature filters products by category and maximum price. What is the primary reason to use parameterized queries instead of string concatenation?"
49+
choices:
50+
- content: "Parameterized queries prevent injection attacks and enable query plan caching"
51+
isCorrect: true
52+
explanation: "Parameters separate query structure from values, preventing malicious input from modifying query logic. Additionally, Azure Cosmos DB can cache and reuse execution plans for parameterized queries, improving performance for repeated queries with different values."
53+
- content: "Parameterized queries automatically convert data types"
54+
isCorrect: false
55+
explanation: "While the SDK handles type serialization, the primary benefits of parameterization are security (preventing injection) and performance (query plan caching). Type conversion isn't the main reason to use parameters."
56+
- content: "Parameterized queries run faster than queries with literal values"
57+
isCorrect: false
58+
explanation: "Individual query execution speed is similar regardless of whether values are parameterized or literal. The performance benefit comes from query plan caching across multiple executions, not from faster single-query execution."
59+
- content: "A data analyst notices that queries filtering products by price range consume more RUs than expected. The container uses categoryId as the partition key. Which optimization would most effectively reduce RU consumption?"
60+
choices:
61+
- content: "Remove the ORDER BY clause from the query"
62+
isCorrect: false
63+
explanation: "While ORDER BY adds some overhead, removing it might break application functionality. The more significant optimization is ensuring queries route to single partitions when possible, which has a larger impact on RU consumption."
64+
- content: "Increase the container's provisioned throughput"
65+
isCorrect: false
66+
explanation: "Increasing throughput provides more capacity but doesn't reduce the RU cost of individual queries. The query still consumes the same RUs; you're just provisioning more capacity. Optimizing the query to use single-partition routing actually reduces per-query cost."
67+
- content: "Add the partition key (categoryId) to the WHERE clause to enable single-partition routing"
68+
isCorrect: true
69+
explanation: "Including the partition key in the filter routes the query to a specific partition instead of fanning out to all partitions. Single-partition queries consume fewer RUs because they don't require coordination across multiple physical partitions."
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.build-query-azure-cosmos-db.summary
3+
title: Summary
4+
metadata:
5+
title: Summary
6+
description: Summary
7+
ms.date: 02/05/2026
8+
author: jeffkoms
9+
ms.author: jeffko
10+
ms.topic: unit
11+
durationInMinutes: 2
12+
content: |
13+
[!include[](includes/7-summary.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
AI applications require data stores that deliver consistent low-latency performance while scaling to meet unpredictable demand. Azure Cosmos DB for NoSQL provides a globally distributed, schemaless document database that supports flexible JSON data models and automatic indexing. This module guides you through connecting to Azure Cosmos DB for NoSQL and building efficient queries to power AI solutions that retrieve and manipulate document data.
2+
3+
Imagine you're a developer building an AI-powered recommendation engine for an e-commerce platform. Your application stores product catalogs, user preferences, and interaction history as JSON documents. Each recommendation request requires retrieving relevant products based on category, price range, and user attributes. All of this must happen within milliseconds to maintain a responsive user experience. The existing relational database struggles to keep up with peak traffic, and schema changes require lengthy migrations that slow down feature development. Your team evaluates Azure Cosmos DB for NoSQL because it provides schema flexibility for evolving data models without downtime. The default indexing policy automatically indexes properties in your items, which reduces the amount of index setup you do upfront. As your query patterns evolve, you might still need to customize indexing policies, such as adding composite indexes for some `ORDER BY` patterns. You can scale throughput independently of storage to handle traffic spikes during promotional events. You need to understand how to structure your data across databases and containers, connect securely using the SDK, and write efficient queries that minimize request unit consumption while delivering the results your AI models need.
4+
5+
After completing this module, you'll be able to:
6+
7+
- Explain the Azure Cosmos DB for NoSQL resource model and how databases, containers, and items relate to each other
8+
- Implement SDK operations to connect to Azure Cosmos DB and perform CRUD operations on items
9+
- Select between point reads and queries based on performance requirements and access patterns
10+
- Build queries using SQL syntax to filter, project, and retrieve data from containers
11+
12+
> [!NOTE]
13+
> All code examples in this module are based on the most recent version of the `azure-cosmos` library for Python at the time of writing. The library is updated often and the recommendation is to visit the [Azure Cosmos DB SDK for Python documentation](/python/api/overview/azure/cosmos-readme) for the most up-to-date information.

0 commit comments

Comments
 (0)