[pgpool-general: 7525] Strange behavior on switchover with detach_false_primary enabled
emond.papegaaij at gmail.com
Mon Apr 26 22:18:08 JST 2021
Recently we've enabled the detach_false_primary option to prevent
pgpool from picking the incorrect primary database in some situations.
However, from the moment we've enabled this option, we are seeing
erratic behavior during a switchover. We start with the following
- 3 backends and 3 pgpool nodes
- node 0 is primary database
- node 0 is pgpool leader
We first stop and restart pgpool on node 0, causing node 1 to become
the leader. Next, we detach the primary database node with
pcp_detach_node. This starts a failover, but during this failover,
things happen that I cannot explain, causing the switchover to fail
miserably. It looks like pgpool starts a second failover while it is
still execution the first.
I've attached the log from the new pgpool leader (at node 1). This is
what I get from the logs:
04:03:29: pgpool stops at node 0 and node 1 is elected as the new leader
04:03:31: pgpool at node 0 rejoins
04:03:33: the detach_node command is received from (and for) node 0
04:03:33: failover starts, node 1 is indicated as the new primary
04:03:34: node 1 is promoted
04:03:34: follow_primary is executed for node 0 to follow node 1
And here things go strange:
04:03:34: 2 additional failovers are started, both for node 0 and 1,
but no new primary is selected
04:03:41: the follow_primary started at 03:34 is completed
04:03:41: a new follow_primary is started instructing node 1 to follow
node 1. This breaks badly, because node 1 now tries to pg_rewind and
pg_basebackup from itself.
04:03:57: we end with only node 0, running in standby.
We would really appreciate it if we could get some help debugging this
issue. If you need more information, please let me know.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 29913 bytes
Desc: not available
More information about the pgpool-general