Skip to content

Merge in v4.1.2#22

Merged
ants merged 40 commits intomultisitefrom
multisite-v4.1.2
Apr 30, 2026
Merged

Merge in v4.1.2#22
ants merged 40 commits intomultisitefrom
multisite-v4.1.2

Conversation

@ants
Copy link
Copy Markdown

@ants ants commented Apr 30, 2026

No description provided.

CyberDem0n and others added 30 commits September 26, 2025 12:14
* Switch to codecov
* Solve flaky behave test -- 9.6 doesn't maintain replication slots on replicas
Clarify that a warning is changed to DEBUG only when the watchdog setting is _not_ set to required.
)

`/sync` key wasn't updated after renaming the leader node with Patroni restart in pause (without Postgres restart).
It prevented Patroni from promoting after the next restart without pause.

Close patroni#3449
extra is called psycopg3, not psycopg
…roni#3457)

Such timeline increase may happen as a result of crash recovery in a single user mode + promote after taking a leader key while other replica nodes are isolated from DCS.
In this case replica nodes didn't trigger pg_rewind state machine because the leader and therefore `primary_conninfo` did't change.
Removed `member-name` from `edit-config` command. Would give impression it is possible to set up node specific DCS configuration.

Minor documentation issue I noticed when upgrading from standalone cluster to Patroni.
I followed the switch to systemd notify introduced in patroni#3301, but receive
the following warning:

```
systemd[1]: patroni.service: Got notification message from PID 2572251, but reception only permitted for main PID 2572234
```

With `Type=notify` systemd implies `NotifyAccess=main` which restricts
notify socket access only to the main process. Apparently patroni tries
to send from a different one. This PR lifts this restriction and allows
all process of the service to access the socket.

see
https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#NotifyAccess=

> If `all`, all services updates from all members of the service's
control group are accepted. This option should be set to open access to
the notification socket when using `Type=notify`/`Type=notify-reload` or
`WatchdogSec=` (see above). If those options are used but
`NotifyAccess=` is not configured, it will be implicitly set to main.
in addition to that did some maintenance:
- removed leftovers from python 2 (from \_\_future\_\_ import)
- improved psycopg tests
- removed usage of f-strings and str.format from logging calls
- switched from multiprocessing.pool.ThreadPool to
concurrent.futures.ThreadPoolExecutor
- actions/checkout -> v5

---------

Co-authored-by: Jorge Solorzano <[email protected]>
`tag.failover_priority` values were ignored when
`synchronous_node_count>1`.

Besides that document limitation regarding `failover_priority` with
`synchronous_mode=quorum`.

Close patroni#3496

---------

Co-authored-by: Hugo DUBOIS <[email protected]>
- handle broken JSON responses
- improve reporting for etcd internal errors

Close patroni#3305,
patroni#3473
ThreadPoolExecutor and as_completed() may return results in
unpredictable order. We better not check node names.
Starting from PostgreSQL 10 we use passfile in primary_conninfo and we
failed to update passfile after replication password was updated in
patroni.yaml with reload.

Close patroni#3470
…atroni#3518)

For PostgreSQL v12 and newer, `pg_settings` cannot be queried while the
server is still starting and not yet accepting connections. As a
workaround, we update `self._current_recovery_params` when writing a new
`postgresql.conf` file.

However, this logic did not account for the fact that
`self._current_recovery_params` must contain all recovery parameters for
correct comparison in `check_recovery_conf()`.

To address this, the missing recovery parameters are now added to
`self._current_recovery_params` in `write_recovery_conf()`, mirroring
the behavior of `_read_recovery_params_pre_v12()`.

Additionally, restore the `Postgresql.is_starting()` check in
`Ha.is_healthiest_node()`, which was mistakenly removed in patroni#2726.

Close patroni#3517
The link was to the YAML configuration, but `use_slots` and `slots` are dynamic configuration items.
Corrected the parameter name from Patroni `synchronous_mode` to PostgreSQL `synchronous_commit`.
We pinned them because of some incompatibilities which were later addressed.
…oni#3537)

Fixes patroni#3533

When `initdb` or `basebackup` options are provided as a dict (instead of
a list), the `option_is_allowed()` validation was bypassed, allowing
blocked options like `compress` to be used.

Added the `option_is_allowed(key)` check to the dict branch in
`process_user_options()`.

---------

Co-authored-by: Muhammad Umair Ali <[email protected]>
Co-authored-by: Alexander Kukushkin <[email protected]>
The `compress` option was completely blocked for basebackup, but since
PostgreSQL 15, server-side compression is useful and works transparently
with plain format.

- Removed `compress` from blocked options list
- Added validation to allow only `--compress=server*` values (e.g.,
server-zstd, server-gzip)
- Reject client-side compression with helpful error message
The response JSON containing error code and message may vary depending
on where it is raised from.

Follow up on patroni#3338 and patroni#3486
Avoid starting/stopping threads in runtime:
1. Always start slot advance thread and CitusHandler thread on Patroni
start.
2. Introduce thread pool for REST API and use it for running REST API
itself and executing incoming HTTP requests.
3. Introduce a global thread pool to execute async tasks and for making
REST API requests during leader race and failsafe checks.
4. Allow configuring global `thread_pool_size` and
`restapi.thread_pool_size`.

Besides that, adjust system(d) start script to run Patroni with
`MALLOC_ARENA_MAX=1` to reduce allocated virtual memory and add
informational warning sections to README.rst and docs.

Close patroni#3474
Close patroni#3481
Ha.shutdown() needs it to run some health checks

Ref patroni#3526
CyberDem0n and others added 10 commits March 21, 2026 15:24
Before patroni#3526 thread was started with the first attempt to sync metadata, where we had guarantees that citus database is prepared (extension exists).
Early start caused enormous amount of errors.
We return to old behavior by using `self._ready_to_run = Event()`
**Problem Description:**
When the Kubernetes API returns a 403 Permission Denied error (e.g., due
to temporary RBAC permission loss), the current code immediately logs an
exception and returns False, which may cause the leader to be
incorrectly demoted. However, in real-world scenarios, permission issues
can be temporary (such as RBAC updates, network fluctuations), and the
application should be given an opportunity to recover within a timeout
period.

**Solution:**
Add a dedicated _handle_permission_denied method that, when encountering
a 403 error:
1. Continuously verifies the leader status within the retry_timeout
period
2. Checks the leader object every 0.5 seconds to confirm whether the
current instance is still the leader
3. Returns one of three states based on the verification result:
'retry': Still the leader, continue retrying the update operation
'demote': Leadership has been lost, demote immediately(return False)
'timeout': Unable to confirm within the timeout period, handle as
timeout(return False)

This PR is to fix issue patroni#3536 .

Co-authored-by: Sophia Ruan <[email protected]>
v3.6.9, v3.5.28, and v3.4.42 addressed some CVEs and now it is no longer
possible to read cluster topology and perform lease keepalive requests
without authentication.

Close patroni#3573
time.sleep(0.001) is very unreliable and makes tests flaky
- Release notes
- Update version
- Pyright 1.1.408
This allows systemd-reload command to wait for the configuration to
actually have been processed.

Encapsulate the import logic in a separate function to be used by the
already present notify implementation.
When systemd receives unexpected notifications it may terminate Patroni unit.

Close patroni#3586
@ants ants merged commit a012bc1 into multisite Apr 30, 2026
22 of 23 checks passed
@ants ants deleted the multisite-v4.1.2 branch April 30, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.