Commit a9459f5
committed
Fix race condition causing a spurious promote during a global DCS outage
Fallback leader observation mechanism was using a non-quorum read that
can see a stale value of multisite status. For purposes of rewinding
stadbys this is fine, but if timing was wrong it caused the main HA loop
to observe the stale value and promote.
Fix this by only running the leader observation fallback while the node
is not a leader. If the node is leader the regular heartbeat will take
care of updating the view.
Observing a stale value during startup is not a problem because
promoting to local leader will force a write to global DCS via
resolve_leader().1 parent 10c1bc4 commit a9459f5
2 files changed
Lines changed: 13 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
300 | 300 | | |
301 | 301 | | |
302 | 302 | | |
| 303 | + | |
303 | 304 | | |
304 | 305 | | |
305 | 306 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
67 | 71 | | |
68 | 72 | | |
69 | 73 | | |
| |||
111 | 115 | | |
112 | 116 | | |
113 | 117 | | |
| 118 | + | |
| 119 | + | |
114 | 120 | | |
115 | 121 | | |
116 | 122 | | |
| |||
419 | 425 | | |
420 | 426 | | |
421 | 427 | | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
422 | 432 | | |
423 | 433 | | |
424 | 434 | | |
| |||
431 | 441 | | |
432 | 442 | | |
433 | 443 | | |
434 | | - | |
| 444 | + | |
| 445 | + | |
435 | 446 | | |
436 | 447 | | |
437 | 448 | | |
| |||
0 commit comments