You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cosmos-db/postgresql/concepts-colocation.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ ms.service: cosmos-db
7
7
ms.subservice: postgresql
8
8
ms.custom: ignite-2022
9
9
ms.topic: conceptual
10
-
ms.date: 05/06/2019
10
+
ms.date: 10/01/2023
11
11
---
12
12
13
13
# Table colocation in Azure Cosmos DB for PostgreSQL
@@ -18,13 +18,13 @@ Colocation means storing related information together on the same nodes. Queries
18
18
19
19
## Data colocation for hash-distributed tables
20
20
21
-
In Azure Cosmos DB for PostgreSQL, a row is stored in a shard if the hash of the value in the distribution column falls within the shard's hash range. Shards with the same hash range are always placed on the same node. Rows with equal distribution column values are always on the same node across tables.
21
+
In Azure Cosmos DB for PostgreSQL, a row is stored in a shard if the hash of the value in the distribution column falls within the shard's hash range. Shards with the same hash range are always placed on the same node. Rows with equal distribution column values are always on the same node across tables. The concept of hash-distributed tables is also known as [row-based sharding](concepts-sharding-models.md#row-based-sharding). In [schema-based sharding](concepts-sharding-models.md#schema-based-sharding), tables within a distributed schema are always colocated.
22
22
23
23
:::image type="content" source="media/concepts-colocation/colocation-shards.png" alt-text="Diagram shows shards with the same hash range placed on the same node for events shards and page shards." border="false":::
24
24
25
25
## A practical example of colocation
26
26
27
-
Consider the following tables that might be part of a multi-tenant web
27
+
Consider the following tables that might be part of a multitenant web
28
28
analytics SaaS:
29
29
30
30
```sql
@@ -153,4 +153,4 @@ In some cases, queries and table schemas must be changed to include the tenant I
153
153
154
154
## Next steps
155
155
156
-
- See how tenant data is colocated in the [multi-tenant tutorial](tutorial-design-database-multi-tenant.md).
156
+
- See how tenant data is colocated in the [multitenant tutorial](tutorial-design-database-multi-tenant.md).
Copy file name to clipboardExpand all lines: articles/cosmos-db/postgresql/concepts-nodes.md
+17-10Lines changed: 17 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: jonels-msft
6
6
ms.service: cosmos-db
7
7
ms.subservice: postgresql
8
8
ms.topic: conceptual
9
-
ms.date: 10/26/2022
9
+
ms.date: 09/29/2023
10
10
---
11
11
12
12
# Nodes and tables in Azure Cosmos DB for PostgreSQL
@@ -25,23 +25,22 @@ allows the database to scale by adding more nodes to the cluster.
25
25
26
26
Every cluster has a coordinator node and multiple workers. Applications
27
27
send their queries to the coordinator node, which relays it to the relevant
28
-
workers and accumulates their results. Applications are not able to connect
29
-
directly to workers.
28
+
workers and accumulates their results.
30
29
31
-
Azure Cosmos DB for PostgreSQL allows the database administrator to *distribute* tables,
32
-
storing different rows on different worker nodes. Distributed tables are the
33
-
key to Azure Cosmos DB for PostgreSQL performance. Failing to distribute tables leaves them entirely
34
-
on the coordinator node and cannot take advantage of cross-machine parallelism.
30
+
Azure Cosmos DB for PostgreSQL allows the database administrator to *distribute* tables and/or schemas,
31
+
storing different rows on different worker nodes. Distributed tables and/or schemas are the
32
+
key to Azure Cosmos DB for PostgreSQL performance. Failing to distribute tables and/or schemas leaves them entirely
33
+
on the coordinator node and can't take advantage of cross-machine parallelism.
35
34
36
35
For each query on distributed tables, the coordinator either routes it to a
37
36
single worker node, or parallelizes it across several depending on whether the
38
-
required data lives on a single node or multiple. The coordinator decides what
37
+
required data lives on a single node or multiple. With [schema-based sharding](concepts-sharding-models.md#schema-based-sharding), the coordinator routes the queries directly to the node that hosts the schema. In both schema-based sharding and [row-based sharding](concepts-sharding-models.md#row-based-sharding), the coordinator decides what
39
38
to do by consulting metadata tables. These tables track the DNS names and
40
39
health of worker nodes, and the distribution of data across nodes.
41
40
42
41
## Table types
43
42
44
-
There are three types of tables in a cluster, each
43
+
There are five types of tables in a cluster, each
45
44
stored differently on nodes and used for different purposes.
46
45
47
46
### Type 1: Distributed tables
@@ -77,7 +76,15 @@ values like order statuses or product categories.
77
76
78
77
When you use Azure Cosmos DB for PostgreSQL, the coordinator node you connect to is a regular PostgreSQL database. You can create ordinary tables on the coordinator and choose not to shard them.
79
78
80
-
A good candidate for local tables would be small administrative tables that don't participate in join queries. An example is a users table for application sign-in and authentication.
79
+
A good candidate for local tables would be small administrative tables that don't participate in join queries. An example is a `users` table for application sign-in and authentication.
80
+
81
+
### Type 4: Local managed tables
82
+
83
+
Azure Cosmos DB for PostgreSQL might automatically add local tables to metadata if a foreign key reference exists between a local table and a reference table. Additionally locally managed tables can be manually created by executing [create_reference_table](reference-functions.md#citus_add_local_table_to_metadata) citus_add_local_table_to_metadata function on regular local tables. Tables present in metadata are considered managed tables and can be queried from any node, Citus knows to route to the coordinator to obtain data from the local managed table. Such tables are displayed as local in [citus_tables](reference-metadata.md#distributed-tables-view) view.
84
+
85
+
### Type 5: Schema tables
86
+
87
+
With [schema-based sharding](concepts-sharding-models.md#schema-based-sharding) introduced in Citus 12.0, distributed schemas are automatically associated with individual colocation groups. Tables created in those schemas are automatically converted to colocated distributed tables without a shard key. Such tables are considered schema tables and are displayed as schema in [citus_tables](reference-metadata.md#distributed-tables-view) view.
Sharding is a technique used in database systems and distributed computing to horizontally partition data across multiple servers or nodes. It involves breaking up a large database or dataset into smaller, more manageable parts called Shards. A shard contains a subset of the data, and together shards form the complete dataset.
17
+
18
+
Azure Cosmos DB for PostgreSQL offers two types of data sharding, namely row-based and schema-based. Each option comes with its own [Sharding tradeoffs](#sharding-tradeoffs), allowing you to choose the approach that best aligns with your application's requirements.
19
+
20
+
## Row-based sharding
21
+
22
+
The traditional way in which Azure Cosmos DB for PostgreSQL shards tables is the single database, shared schema model also known as row-based sharding, tenants coexist as rows within the same table. The tenant is determined by defining a [distribution column](./concepts-nodes.md#distribution-column), which allows splitting up a table horizontally.
23
+
24
+
Row-based is the most hardware efficient way of sharding. Tenants are densely packed and distributed among the nodes in the cluster. This approach however requires making sure that all tables in the schema have the distribution column and that all queries in the application filter by it. Row-based sharding shines in IoT workloads and for achieving the best margin out of hardware use.
25
+
26
+
Benefits:
27
+
28
+
* Best performance
29
+
* Best tenant density per node
30
+
31
+
Drawbacks:
32
+
33
+
* Requires schema modifications
34
+
* Requires application query modifications
35
+
* All tenants must share the same schema
36
+
37
+
## Schema-based sharding
38
+
39
+
Available with Citus 12.0 in Azure Cosmos DB for PostgreSQL, schema-based sharding is the shared database, separate schema model, the schema becomes the logical shard within the database. Multitenant apps can use a schema per tenant to easily shard along the tenant dimension. Query changes aren't required and the application only needs a small modification to set the proper search_path when switching tenants. Schema-based sharding is an ideal solution for microservices, and for ISVs deploying applications that can't undergo the changes required to onboard row-based sharding.
40
+
41
+
Benefits:
42
+
43
+
* Tenants can have heterogeneous schemas
44
+
* No schema modifications required
45
+
* No application query modifications required
46
+
* Schema-based sharding SQL compatibility is better compared to row-based sharding
47
+
48
+
Drawbacks:
49
+
50
+
* Fewer tenants per node compared to row-based sharding
51
+
52
+
## Sharding tradeoffs
53
+
54
+
<br />
55
+
56
+
|| Schema-based sharding | Row-based sharding|
57
+
|---|---|---|
58
+
|Multi-tenancy model|Separate schema per tenant|Shared tables with tenant ID columns|
59
+
|Citus version|12.0+|All versions|
60
+
|Extra steps compared to vanilla PostgreSQL|None, only a config change|Use create_distributed_table on each table to distribute & colocate tables by tenant ID|
61
+
|Number of tenants|1-10k|1-1 M+|
62
+
|Data modeling requirement|No foreign keys across distributed schemas|Need to include a tenant ID column (a distribution column, also known as a sharding key) in each table, and in primary keys, foreign keys|
63
+
|SQL requirement for single node queries|Use a single distributed schema per query|Joins and WHERE clauses should include tenant_id column|
|Data sharing across tenants|Yes, using reference tables (in a separate schema)|Yes, using reference tables|
68
+
|Tenant to shard isolation|Every tenant has its own shard group by definition|Can give specific tenant IDs their own shard group via isolate_tenant_to_new_shard|
Copy file name to clipboardExpand all lines: articles/cosmos-db/postgresql/concepts-upgrade.md
+8-4Lines changed: 8 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: jonels-msft
6
6
ms.service: cosmos-db
7
7
ms.subservice: postgresql
8
8
ms.topic: conceptual
9
-
ms.date: 05/16/2023
9
+
ms.date: 10/01/2023
10
10
---
11
11
12
12
# Cluster upgrades in Azure Cosmos DB for PostgreSQL
@@ -16,7 +16,7 @@ ms.date: 05/16/2023
16
16
The Azure Cosmos DB for PostgreSQL managed service can handle upgrades of both the
17
17
PostgreSQL server, and the Citus extension. All clusters are created with [the latest Citus version](./reference-extensions.md#citus-extension) available for the major PostgreSQL version you select during cluster provisioning. When you select a PostgreSQL version such as PostgreSQL 15 for in-place cluster upgrade, the latest Citus version supported for selected PostgreSQL version is going to be installed.
18
18
19
-
If you need to upgrade the Citus version only, you can do so by using an in-place upgrade. For instance, you may want to upgrade Citus 11.0 to Citus 11.3 on your PostgreSQL 14 cluster without upgrading Postgres version.
19
+
If you need to upgrade the Citus version only, you can do so by using an in-place upgrade. For instance, you might want to upgrade Citus 11.0 to Citus 11.3 on your PostgreSQL 14 cluster without upgrading Postgres version.
20
20
21
21
## Upgrade precautions
22
22
@@ -30,10 +30,14 @@ Also, upgrading a major version of Citus can introduce changes in behavior.
30
30
It's best to familiarize yourself with new product features and changes to
31
31
avoid surprises.
32
32
33
+
Noteworthy Citus 12 changes:
34
+
* The default rebalance strategy changed from `by_shard_count` to `by_disk_size`.
35
+
* Support for PostgreSQL 13 has been dropped as of this version.
36
+
33
37
Noteworthy Citus 11 changes:
34
38
35
-
* Table shards may disappear in your SQL client. Their visibility
36
-
is now controlled by
39
+
* Table shards might disappear in your SQL client. You can control their visibility
0 commit comments