Skip to content

vip-manager does not remove VIP when unable to talk to etcd #354

@markwort

Description

@markwort

Investigating a customer complaint, I discovered that something is broken.
When etcd is unreachable, vip-manager does not remove the VIP, and instead continues to assume that it is still primary.
Furthermore, it does not even log when it looses the ability to talk to etcd.

Current status:

  • vip-manager does not log when it is unable to talk to etcd (ifor example not even able to open a TCP connection)
  • vip-manager does not remove the VIP when unable to talk to etcd

When we are not able to talk to etcd, we must assume that the same is true for patroni, or at least that it is likely that there is an issue that also affects patroni.
This means that Patroni might choose a different Primary, and we can no longer safely assume that we can hold on to the VIP.
We must fail early if we cannot confirm the VIP should still be registered on our local device, and remove the VIP from the interface.

The problem seems to exist since v2.0.0...

Some log with my interjections marked by <>:

julian@fedora-t14:~/git/cybertec-postgresql/vip-manager$ sudo ./vip-manager --config vipconfig/vip-manager.yml 
Place your finger on the fingerprint reader
2026/02/24 14:51:44 Using config from file: vipconfig/vip-manager.yml
2026/02/24 14:51:44 This is the config that will be used:
	config : vipconfig/vip-manager.yml
	dcs-endpoints : [http://127.0.0.1:2379]
	dcs-type : etcd
	hosting-type : basic
	hostingtype : basic
	interface : wlp2s0
	interval : 1000
	ip : 192.168.178.123
	manager-type : basic
	netmask : 24
	retry-after : 250
	retry-num : 2
	trigger-key : /service/pgcluster/leader
	trigger-value : pgcluster_member1
	verbose : false
	version : false
<etcd is unavailable>
2026/02/24 14:51:44 IP address 192.168.178.123/24 state is false, desired false
<etcd becomes available>
2026/02/24 14:51:53 current leader from DCS: pgcluster_member1
2026/02/24 14:51:53 set WATCH on /service/pgcluster/leader
2026/02/24 14:51:53 IP address 192.168.178.123/24 state is false, desired true
2026/02/24 14:51:53 Configuring address 192.168.178.123/24 on wlp2s0
2026/02/24 14:51:53 Sent gratuitous ARP reply
2026/02/24 14:51:53 Sent gratuitous ARP request
2026/02/24 14:51:53 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:51:54 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:52:04 IP address 192.168.178.123/24 state is true, desired true
<etcd becomes unavailable>
2026/02/24 14:52:14 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:52:24 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:52:34 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:52:44 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:52:54 IP address 192.168.178.123/24 state is true, desired true
2026/02/24 14:53:04 IP address 192.168.178.123/24 state is true, desired true'
<this continues for all eternity, or until etcd becomes available and shows a different state again>

To reproduce this, it is enough to launch etcd on localhost and configure vip-manager accordingly.

etcd --data-dir /tmp/etcd

Then create a key and value as Patroni would, if it chooses a leader. The value is chosen to match vipconfig/vip-manager.yml :

etcdctl put /service/pgcluster/leader 'pgcluster_member1'

Best regards
Julian

Metadata

Metadata

Assignees

Labels

Type

Projects

Status

To do

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions