Investigating a customer complaint, I discovered that something is broken.
When etcd is unreachable, vip-manager does not remove the VIP, and instead continues to assume that it is still primary.
Furthermore, it does not even log when it looses the ability to talk to etcd.
Current status:
- vip-manager does not log when it is unable to talk to etcd (ifor example not even able to open a TCP connection)
- vip-manager does not remove the VIP when unable to talk to etcd
When we are not able to talk to etcd, we must assume that the same is true for patroni, or at least that it is likely that there is an issue that also affects patroni.
This means that Patroni might choose a different Primary, and we can no longer safely assume that we can hold on to the VIP.
We must fail early if we cannot confirm the VIP should still be registered on our local device, and remove the VIP from the interface.
The problem seems to exist since v2.0.0...
Some log with my interjections marked by <>:
julian@fedora-t14:~/git/cybertec-postgresql/vip-manager$ sudo ./vip-manager --config vipconfig/vip-manager.yml
Place your finger on the fingerprint reader
2026/02/24 14:51:44 Using config from file: vipconfig/vip-manager.yml
2026/02/24 14:51:44 This is the config that will be used:
config : vipconfig/vip-manager.yml
dcs-endpoints : [http://127.0.0.1:2379]
dcs-type : etcd
hosting-type : basic
hostingtype : basic
interface : wlp2s0
interval : 1000
ip : 192.168.178.123
manager-type : basic
netmask : 24
retry-after : 250
retry-num : 2
trigger-key : /service/pgcluster/leader
trigger-value : pgcluster_member1
verbose : false
version : false
<etcd is unavailable>
2026/02/24 14:51:44 IP address 192.168.178.123/24 state is false, desired false
<etcd becomes available>
2026/02/24 14:51:53 current leader from DCS: pgcluster_member1
2026/02/24 14:51:53 set WATCH on /service/pgcluster/leader
2026/02/24 14:51:53 IP address 192.168.178.123/24 state is false, desired true
2026/02/24 14:51:53 Configuring address 192.168.178.123/24 on wlp2s0
2026/02/24 14:51:53 Sent gratuitous ARP reply
2026/02/24 14:51:53 Sent gratuitous ARP request
2026/02/24 14:51:53 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:51:54 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:52:04 IP address 192.168.178.123/24 state is true, desired true
<etcd becomes unavailable>
2026/02/24 14:52:14 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:52:24 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:52:34 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:52:44 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:52:54 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:53:04 IP address 192.168.178.123/24 state is true, desired true'
<this continues for all eternity, or until etcd becomes available and shows a different state again>
To reproduce this, it is enough to launch etcd on localhost and configure vip-manager accordingly.
etcd --data-dir /tmp/etcd
Then create a key and value as Patroni would, if it chooses a leader. The value is chosen to match vipconfig/vip-manager.yml :
etcdctl put /service/pgcluster/leader 'pgcluster_member1'
Best regards
Julian
Investigating a customer complaint, I discovered that something is broken.
When etcd is unreachable, vip-manager does not remove the VIP, and instead continues to assume that it is still primary.
Furthermore, it does not even log when it looses the ability to talk to etcd.
Current status:
When we are not able to talk to etcd, we must assume that the same is true for patroni, or at least that it is likely that there is an issue that also affects patroni.
This means that Patroni might choose a different Primary, and we can no longer safely assume that we can hold on to the VIP.
We must fail early if we cannot confirm the VIP should still be registered on our local device, and remove the VIP from the interface.
The problem seems to exist since v2.0.0...
Some log with my interjections marked by
<>:To reproduce this, it is enough to launch etcd on localhost and configure vip-manager accordingly.
Then create a key and value as Patroni would, if it chooses a leader. The value is chosen to match
vipconfig/vip-manager.yml:Best regards
Julian